Rawheds Tutorial#3:

32bpp Graphics Coding

[Introduction]
[32BPP Basics]
[Alpha Channel?]
[More 32BPP RGB]
[MMX Helps out]
[Conversion]
[Converting to 24BPP]
[Converting to 16BPP]
[Converting to 8BPP(mode13h)]
[Converting to TextMode]
[Closing Words]

---==[Introduction]==---------------------------------------------------------

This tutorial is based on how my current vesa/gfx engine works. I'd previously been doing just 16bpp graphics, and I had to code all of my routines for 16bpp. When I first started with 16bpp it was quite a novelty, but after a while(tut2) I was getting irritated with it and wanted a more flexable model. I saw demos which could run in TONS of modes like 8bpp, 15bpp, 16bpp, 24bpp, 32bpp and even textmode! A lot of demos could have their mode changed from the commandline, and I realised that this pure 16bpp model of mine was not so cool and very unflexable.

What this tutorial covers is a different way of coding gfx engines so that they can handle multiple color depths. Basically what happens is that you create all your memory buffers as if they were holding 32bpp graphics, and all of your internal graphics code works at the 32bpp level, and then finally when you want to flip the frame to the screen you just convert to the appropriate bpp level. So you could have conversion functions to convert between 32bpp-->16bpp and 32bpp-->8bpp, and then you would flip that into video memory.

I also had a lot of trouble finding out the video mode for 32BPP modes. All the vesa docs I read only had up to 24BPP. Eventually I found(from UNIVBE) that 320x200x32bpp is 146h mode.

I can't rememeber where I heard of this idea from, but I do know that its not original. Infact A LOT of demo groups use it. But since I couldn't find any tuts on it, and I thought it works very well I wrote this tut. So lets go then.

---==[32BPP Basics]==---------------------------------------------------------

Although 32bpp alows way more colours than the other modes(16bpp etc) it is actually the easiest to code for! 15&16bpp modes are cool, but they only offer 32768 & 65536 colours, and they are difficult to work with because they have the RGB values packed into them(see tut1).

The 32BPP format is easy, and of course each pixel takes up 32bits(4 bytes) of memory. You have to be careful because a 320x200 surface can take up a lot more memory than lesser modes.


        320x200x32bpp - 256k / layer
        320x200x24bpp - 192k / layer
        320x200x16bpp - 128k / layer
        320x200x15bpp - 128k / layer     ;may as well use 16bpp huh? :)
        320x200x08bpp - 64k  / layer

So only 4 32bpp layers and you are using a MEG of memory!
Here is how the 4 bytes are structured:


        [1 byte] [1 byte] [1 byte] [1 byte]
        AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB

8bits for Alpha channel, 8 for Red, 8 for green and 8 for blue. As you can see, you have the same range of RGB colours as you do in 24bpp. So why use 32bpp if its just gonna take up more memory? Simple. First of all its faster. Why would it be faster to read/write 4 bytes as opposed to 3? Basically the computer handles R/W faster when it has to read an even number of bytes. Also, you don't get 24bit registers. For example:


        ;24bpp clear screen(for arguments sake)
        mov edi,[dest]
        mov eax,[color]   ;24bit color, with upper 8bits=0
        mov ecx,64000	  ;number of pixels for 320x200
        @slowloop:
        stosw		  ;write 2 bytes
        stosb		  ;write 1 byte
        dec ecx
        jnz @slowloop

        ;32bpp clear screen
        mov edi,[dest]
        mov eax,[color]   ;32bit color, with upper 8bits=0
        mov ecx,64000	  ;number of pixels for 320x200
        rep stosd	  ;loop writing 32bits/time

Because there is no easy was of writing 3 bytes at a time, its much easier to write 4 bytes. Hence 32bpp modes :)

---==[Alpha Channel?]==-------------------------------------------------------

Well I must confess, as the time that I'm typing this I've never used the alpha channel, or really thought about what it could be used for...So I'm sort of gonna be making this up as I go along. But I'm sure you can think of groovy things to use it for. Having an extra 8 bits on your layers/surfaces is very handy indeed.

1]You could use it to define MANY characteristics of the surface pixels.


     Eg.
                A A A A A A A A
                7 6 5 4 3 2 1 0 bits
                | | | | | | | | 
                | | | | | | | |____Active
                | | | | | | | 
                | | | | |_|_| 
                | | | |   |________Draw style
                | | | |  
                |_|_|_|
                   |_______________Percentage Transparent
 
        !Active(0-1) - Whether the pixel is drawn/not.
                       Useful for images with holes in them.
                       Sort of like a built-in mask.

        !Draw style(0-7) - How to draw the pixel.
                           eg, 0=normal(opaque)
                               1=additive
                               2=subtractive
                               3=multiplication
                               4=difference
                               5=transparent
                               6=?
                               7=?

        !Percentage Transpart(0-15) - How transparent the pixel is.
                                      so 15=fully opaque, and 0=invisible/

This is just an example of one way you could to things. Although I think a simplifies version of the above would be better for the realtime demos of today.

2]You could keep things simple and just use the 8 alpha bits for doing your own internal transparency etc. This is probably what most people use it for. Very handy, but not something I've done myself.

---==[More 32BPP RGB]==-------------------------------------------------------

Ok, so now you know the format etc. Now to show you some nice things. Want to add 2 RGB pixels together? Sure, easy - not like 16bpp.


        ;adding 2 32bit colors together(assuming the alpha byte is ignored)
        mov eax,[col1]
        mov ebx,[col2]
        and eax,11111111_11111110_11111110_11111110b
        and ebx,11111111_11111110_11111110_11111110b
        shr eax,1
        shr ebx,1
        add eax,ebx
        mov [edi],eax

A very nice trick that I found was with MMX instructions. They have something which I found perfect for 32BPP functions. I'm not about to write an MMX tutorial :) so go and read another doc for that, but I want to introduct one MMX feature in particular. Saturated registers.

Lets take a simple additive surface loop. Here you have 2 320x200x32bpp surfaces, both with pictures on them and you want to add them together.
Eg:


        //pseudo code
        long col1,col2,colf;
        col1=memget(blah);              //32bit
        col2=memget(blah2);             //32bit
        colf.r=col1.r+col2.r;
        colf.g=col1.g+col2.g;
        colf.b=col1.b+col2.b;
                                        //but now instead of dividing by 2 as
                                        //we do for transparency, we clip
                                        // to 255;
        if (colf.r>255) colf.r=255;
        if (colf.g>255) colf.g=255;
        if (colf.b>255) colf.b=255;
        memput(blah3)=colf;

Doing that for every pixel would be VERY slow yes? Even doing that in normal ASM would be slowish. But MMX can make it easier. I use NASM, you should too :)

---==[MMX Helps out]==--------------------------------------------------------

Saturated registers are registers which don't overflow. Normally if you a dded 250+20 in a byte value(say AL), at the end AL would = 4. So what MMX's saturated registers does is clips it. So when you do an MMX add, 250+20 = 255. Funky eh? MMX works with 8 mmx registers(MM0-MM7), each are 64bit egisters. So you can store 2 32BPP pixels in each register! This is VERY cool because it means that using 1 instruction you can additively add 2 pairs of pixels.

Two MMX instructions which I have found handy are: PADDUSB & PSUBUSB


        PADDUSB - Saturated ADD, unsigned, saturated at the byte level.
        PSUBUSB - Saturated SUB, unsigned, saturated at the byte level.

Here are 2 MMX registers(64bits each) filled with 2 pixels each:


     [-------------------------------64 BITS-------------------------------]
     [-------------32 BITS-------------] [-------------32 BITS-------------]
     [----16 BITS----] [----16 BITS----] [----16 BITS----] [----16 BITS----]
     [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS]
MM0: AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB   
MM1: AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB

The MMX instruction: PADDUSB MM0,MM1 basically adds each 8bit segment, and clips the addition to 255. Same with PSUBUSB MM0,MM1 except that is clips it to 0. Here is how we could use this in a complete function. This function does the same as the above pseudo code, but MUCH quicker.


        ;ASM 32bpp MMX adding
        mov edi,[dest]
        mov esi,[src]
        mov ecx,32000
        @MMX_layeraddloop:
                movq MM0,[edi]          ;Move QUAD(64bits)
                movq MM1,[esi]          ;Move QUAD(64bits)
                paddusb MM0,MM1         ;Saturated Add
                movq [esi],MM0          ;Move QUAD(64bits)
                add esi,8
                add edi,8
                dec ecx
        jnz @MMX_layeraddloop
        EMMS                            ;Must always do this after about of 
                                        ;MMX instructions

You won't believe how fast this is until you try it.

---==[Conversion]==-----------------------------------------------------------

Ok, so you've written a groovy internal 32bpp gfx library. Complete with texture-mapped four dimensional splines and beautiful particles algorithms. Now what? Well you have to copy you buffer into videomemory so that it can be seen :) The nice thing is that the viewer doesn't have to have a videocard that can handle 32BPP. You can convert the image in the buffer to the appropriate format and then flip. Eg:


        if (vmode==_32bit) FLIPtoSCREEN32_(final.addr);
                          else
        if (vmode==_text) {
                           convtxt_(final.addr,buffery.addr);
                           FLIPtoSCREENtxt_(buffery.addr);
                          } else
        if (vmode==_8bit) {
                           conv8_(final.addr,buffery.addr);
                           FLIPtoSCREEN8_(buffery.addr);
                          } else
        if (vmode==_16bit) {
                           conv16_(final.addr,buffery.addr);
                           FLIPtoSCREEN16_(buffery.addr);
                           }

A nice feature that I've added to my demo (which I'm busy writing) is that you can change videomodes while running the demo by pressing F1-F4. I thought this was quite a groovy idea :)

Before I actually sat to code my 32BPP engine, I thought it would be very slow to convert all the time. I mean one fullscreen color conversion MUST be slow. But its not that bad :) Why not? Ok, lets take the videomodes from the above code:


        1]  32BPP - no conversion needed.
                    Just a 256k flip.
        2]  16BPP - conversion needed.
                    But just then a 128k flip.
        3]  8BPP  - conversion needed.
                    But just then a 64k flip.
        4]  text  - conversion needed. 
                    But just then a 4k flip.

As you can see, even though you have to convert, the ammount of data you have to push to the video card becomes less, so it sort of compensates :) And besides, the conversion routine ISN'T that costly. I actually love figuring out new ways(and faster ways) to convert between different pixel formats. Its FUN :) Below are the algorithms that I use. If you use them please credit me and send me a little email ;). I don't claim that they are the best or anything, and if you can see kewl ways to improve them pleaser give me a shout.

---==[Converting to 24BPP]==--------------------------------------------------

Well, this should be very easy :) Just chop off the ALPHA channel. So I'll leave this one up to you :)

---==[Converting to 16BPP]==--------------------------------------------------

Have fun trying to come up with your own methods :) I think PTC has some nice conversion routines, although I have yet to check them out.


        ;32BPP->16BPP conversion(320x200)
        proc conv16_ src,dest:dword
            pushad
            push edi
            push esi
            mov edi,[dest]
            mov esi,[src]
            mov ecx,64000
            @conv16_loop:
                mov eax,[esi]
                and eax,00000000111110001111110011111000b
                shr ah,2
                shr ax,3
                ror eax,8
                add al,ah
                rol eax,8
                stosw
                add esi,4
            dec ecx
            jnz @conv16_loop
            pop esi
            pop edi
            popad
            ret
        endp    conv16_

---==[Converting to 8BPP(mode13h)]==------------------------------------------>

Have fun trying to come up with your own methods :) I think PTC has some nice conversion routines, although I have yet to check them out. This function doesn't take into account the palette. Infact, all it does is assume you've set your palette to go from 0(black) to 255(white), and then finds the approximate brightness of the RGB values and uses them. I know its lame :), but I've seen other demos doing the same thing. Oh well, I'm sure I'll write a color palette version very soon, as I've only adopted this 32BPP internal mode about 2 weeks ago.


        ;32BPP->8BPP conversion(320x200)
        proc conv8_ src,dest:dword
            pushad
            push edi
            push esi
            mov edi,[dest]
            mov esi,[src]
            mov ecx,64000
            @conv8_loop:
                mov ebx,[esi]
                mov eax,ebx
                rol ebx,16
                and ebx,255
                and eax,255
                add ax,bx
                ror ebx,16
                shr ebx,8
                and ebx,255
                add ax,bx
                shr eax,2
                stosb
                add esi,4
            dec ecx
            jnz @conv8_loop
            pop esi
            pop edi
            popad
            ret
        endp    conv8_

---==[Converting to TextMode]==-----------------------------------------------

Hmmm, this was hard :) hehe, its amazing that with these graphics modes, it seems to get easier with the more colors you can have. I mean 32BPP is dead easy to code, 16BPP is harder, and textmode is quite a mission :) This is a VERY simple hack, and if you can make a better one, please let me know all about it. This one just writes character #176, #177, #178, #219 to the screen depending on the brightness of the RGB value. And it also selects the color(0-15) base on the "brightness" of the RGB value. So it assumes that your palette goes from dark-->bright. Unfortunately I haven't made it do funky things like realtime change the palette or search for the best color. I'll probably do this soon. This is basically just a test:


        ;32BPP->Textmode conversion(80x50)
        proc convtxt_ src,dest:dword
            pushad
            push edi
            push esi
            mov edi,[dest]
            mov esi,[src]
            mov edx,50
            @convtxt_loopy:
                mov ecx,80
                @convtxt_loopx:
                mov ebx,[esi]
                mov eax,ebx
                rol ebx,16
                and ebx,255
                and eax,255
                add ax,bx
                ror ebx,16
                shr ebx,8
                and ebx,255
                add ax,bx
                shr eax,2
                mov ah,al

                mov bl,0
                cmp al,0
                jle @asc0
                cmp al,48
                jge @asc0
                mov bl,176
                jmp @ascout
                @asc0:

                cmp al,48
                jle @asc1
                cmp al,96
                jge @asc1
                mov bl,177
                jmp @ascout
                @asc1:

                cmp al,96
                jle @asc2
                cmp al,144
                jge @asc2
                mov bl,178
                jmp @ascout
                @asc2:
	
                cmp al,144
                jle @asc3
                mov bl,219
                jmp @ascout
                @asc3:

                @ascout:

                shr ah,4
                mov al,bl
                stosw
                add esi,16
                dec ecx
                jnz @convtxt_loopx
            add esi,3840
            dec edx
            jnz @convtxt_loopy
            pop esi
            pop edi
            popad
            ret
        endp    convtxt_

---==[Closing Words]==--------------------------------------------------------

Phew :) I really hope this helps some people out there, in some way or another. Please send me any thoughts/ideas/improvements on this topic, I'd really like to hear/see them. The scene is wonderful, long live the scene. When I die I want to go to a scene heaven ;)

-Rawhed/Sensory Overload
-Mailto:sfeist@netactive.co.za
-Htpp://www.surf.to/demos/
-Andrew Griffiths
-South Africa
-05-07-1999