Rawheds Tutorial#3:
[Introduction]
[32BPP Basics]
[Alpha Channel?]
[More 32BPP RGB]
[MMX Helps out]
[Conversion]
[Converting to 24BPP]
[Converting to 16BPP]
[Converting to 8BPP(mode13h)]
[Converting to TextMode]
[Closing Words]
---==[Introduction]==---------------------------------------------------------
This tutorial is based on how my current vesa/gfx engine works. I'd
previously been doing just 16bpp graphics, and I had to code all of my
routines for 16bpp. When I first started with 16bpp it was quite a novelty,
but after a while(tut2) I was getting irritated with it and wanted a more
flexable model. I saw demos which could run in TONS of modes like 8bpp,
15bpp, 16bpp, 24bpp, 32bpp and even textmode! A lot of demos could have
their mode changed from the commandline, and I realised that this pure 16bpp
model of mine was not so cool and very unflexable.
What this tutorial covers is a different way of coding gfx engines so that
they can handle multiple color depths. Basically what happens is that you
create all your memory buffers as if they were holding 32bpp graphics, and
all of your internal graphics code works at the 32bpp level, and then finally
when you want to flip the frame to the screen you just convert to the
appropriate bpp level. So you could have conversion functions to convert
between 32bpp-->16bpp and 32bpp-->8bpp, and then you would flip that into
video memory.
I also had a lot of trouble finding out the video mode for 32BPP modes. All
the vesa docs I read only had up to 24BPP. Eventually I found(from UNIVBE)
that 320x200x32bpp is 146h mode.
I can't rememeber where I heard of this idea from, but I do know that its not
original. Infact A LOT of demo groups use it. But since I couldn't find any
tuts on it, and I thought it works very well I wrote
this tut. So lets go then.
---==[32BPP Basics]==---------------------------------------------------------
Although 32bpp alows way more colours than the other modes(16bpp etc) it is
actually the easiest to code for! 15&16bpp modes are cool, but they only
offer 32768 & 65536 colours, and they are difficult to work with because they
have the RGB values packed into them(see tut1).
The 32BPP format is easy, and of course each pixel takes up 32bits(4 bytes)
of memory. You have to be careful because a 320x200 surface can take up a
lot more memory than lesser modes.
320x200x32bpp - 256k / layer 320x200x24bpp - 192k / layer 320x200x16bpp - 128k / layer 320x200x15bpp - 128k / layer ;may as well use 16bpp huh? :) 320x200x08bpp - 64k / layer
So only 4 32bpp layers and you are using a MEG of memory!
Here is how the 4 bytes are structured:
[1 byte] [1 byte] [1 byte] [1 byte] AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB
8bits for Alpha channel, 8 for Red, 8 for green and 8 for blue. As you can
see, you have the same range of RGB colours as you do in 24bpp. So why use
32bpp if its just gonna take up more memory? Simple. First of all its
faster. Why would it be faster to read/write 4 bytes as opposed to 3?
Basically the computer handles R/W faster when it has to read an even number
of bytes. Also, you don't get 24bit registers. For example:
;24bpp clear screen(for arguments sake) mov edi,[dest] mov eax,[color] ;24bit color, with upper 8bits=0 mov ecx,64000 ;number of pixels for 320x200 @slowloop: stosw ;write 2 bytes stosb ;write 1 byte dec ecx jnz @slowloop ;32bpp clear screen mov edi,[dest] mov eax,[color] ;32bit color, with upper 8bits=0 mov ecx,64000 ;number of pixels for 320x200 rep stosd ;loop writing 32bits/time
Because there is no easy was of writing 3 bytes at a time, its much
easier to write 4 bytes. Hence 32bpp modes :)
---==[Alpha Channel?]==-------------------------------------------------------
Well I must confess, as the time that I'm typing this I've never used the
alpha channel, or really thought about what it could be used for...So I'm
sort of gonna be making this up as I go along. But I'm sure you can think of
groovy things to use it for. Having an extra 8 bits on your layers/surfaces
is very handy indeed.
1]You could use it to define MANY characteristics of the surface pixels.
Eg. A A A A A A A A 7 6 5 4 3 2 1 0 bits | | | | | | | | | | | | | | | |____Active | | | | | | | | | | | |_|_| | | | | |________Draw style | | | | |_|_|_| |_______________Percentage Transparent !Active(0-1) - Whether the pixel is drawn/not. Useful for images with holes in them. Sort of like a built-in mask. !Draw style(0-7) - How to draw the pixel. eg, 0=normal(opaque) 1=additive 2=subtractive 3=multiplication 4=difference 5=transparent 6=? 7=? !Percentage Transpart(0-15) - How transparent the pixel is. so 15=fully opaque, and 0=invisible/
This is just an example of one way you could to things. Although I think a
simplifies version of the above would be better for the realtime demos of
today.
2]You could keep things simple and just use the 8 alpha bits for doing your
own internal transparency etc. This is probably what most people use it
for. Very handy, but not something I've done myself.
---==[More 32BPP RGB]==-------------------------------------------------------
Ok, so now you know the format etc. Now to show you some nice things. Want
to add 2 RGB pixels together? Sure, easy - not like 16bpp.
;adding 2 32bit colors together(assuming the alpha byte is ignored) mov eax,[col1] mov ebx,[col2] and eax,11111111_11111110_11111110_11111110b and ebx,11111111_11111110_11111110_11111110b shr eax,1 shr ebx,1 add eax,ebx mov [edi],eax
A very nice trick that I found was with MMX instructions. They have something
which I found perfect for 32BPP functions. I'm not about to write an MMX
tutorial :) so go and read another doc for that, but I want to introduct one
MMX feature in particular. Saturated registers.
Lets take a simple additive surface loop. Here you have 2 320x200x32bpp
surfaces, both with pictures on them and you want to add them together.
Eg:
//pseudo code long col1,col2,colf; col1=memget(blah); //32bit col2=memget(blah2); //32bit colf.r=col1.r+col2.r; colf.g=col1.g+col2.g; colf.b=col1.b+col2.b; //but now instead of dividing by 2 as //we do for transparency, we clip // to 255; if (colf.r>255) colf.r=255; if (colf.g>255) colf.g=255; if (colf.b>255) colf.b=255; memput(blah3)=colf;
Doing that for every pixel would be VERY slow yes? Even doing that in normal
ASM would be slowish. But MMX can make it easier. I use NASM, you should too :)
---==[MMX Helps out]==--------------------------------------------------------
Saturated registers are registers which don't overflow. Normally if you a
dded 250+20 in a byte value(say AL), at the end AL would = 4. So what MMX's
saturated registers does is clips it. So when you do an MMX add, 250+20
= 255. Funky eh? MMX works with 8 mmx registers(MM0-MM7), each are 64bit
egisters. So you can store 2 32BPP pixels in each register! This is VERY
cool because it means that using 1 instruction you can additively add 2
pairs of pixels.
Two MMX instructions which I have found handy are: PADDUSB & PSUBUSB
PADDUSB - Saturated ADD, unsigned, saturated at the byte level. PSUBUSB - Saturated SUB, unsigned, saturated at the byte level.
Here are 2 MMX registers(64bits each) filled with 2 pixels each:
[-------------------------------64 BITS-------------------------------] [-------------32 BITS-------------] [-------------32 BITS-------------] [----16 BITS----] [----16 BITS----] [----16 BITS----] [----16 BITS----] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] [8 BITS] MM0: AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB MM1: AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB AAAAAAAA RRRRRRRR GGGGGGGG BBBBBBBB
The MMX instruction: PADDUSB MM0,MM1 basically adds each 8bit segment, and
clips the addition to 255. Same with PSUBUSB MM0,MM1 except that is clips it
to 0. Here is how we could use this in a complete function. This function
does the same as the above pseudo code, but MUCH quicker.
;ASM 32bpp MMX adding mov edi,[dest] mov esi,[src] mov ecx,32000 @MMX_layeraddloop: movq MM0,[edi] ;Move QUAD(64bits) movq MM1,[esi] ;Move QUAD(64bits) paddusb MM0,MM1 ;Saturated Add movq [esi],MM0 ;Move QUAD(64bits) add esi,8 add edi,8 dec ecx jnz @MMX_layeraddloop EMMS ;Must always do this after about of ;MMX instructions
You won't believe how fast this is until you try it.
---==[Conversion]==-----------------------------------------------------------
Ok, so you've written a groovy internal 32bpp gfx library. Complete with
texture-mapped four dimensional splines and beautiful particles algorithms.
Now what? Well you have to copy you buffer into videomemory so that it can
be seen :) The nice thing is that the viewer doesn't have to have a videocard
that can handle 32BPP. You can convert the image in the buffer to the
appropriate format and then flip. Eg:
if (vmode==_32bit) FLIPtoSCREEN32_(final.addr); else if (vmode==_text) { convtxt_(final.addr,buffery.addr); FLIPtoSCREENtxt_(buffery.addr); } else if (vmode==_8bit) { conv8_(final.addr,buffery.addr); FLIPtoSCREEN8_(buffery.addr); } else if (vmode==_16bit) { conv16_(final.addr,buffery.addr); FLIPtoSCREEN16_(buffery.addr); }
A nice feature that I've added to my demo (which I'm busy writing) is that
you can change videomodes while running the demo by pressing F1-F4. I
thought this was quite a groovy idea :)
Before I actually sat to code my 32BPP engine, I thought it would be very slow
to convert all the time. I mean one fullscreen color conversion MUST be slow.
But its not that bad :) Why not? Ok, lets take the videomodes from the above
code:
1] 32BPP - no conversion needed. Just a 256k flip. 2] 16BPP - conversion needed. But just then a 128k flip. 3] 8BPP - conversion needed. But just then a 64k flip. 4] text - conversion needed. But just then a 4k flip.
As you can see, even though you have to convert, the ammount of data you have
to push to the video card becomes less, so it sort of compensates :) And
besides, the conversion routine ISN'T that costly. I actually love figuring
out new ways(and faster ways) to convert between different pixel formats.
Its FUN :) Below are the algorithms that I use. If you use them please
credit me and send me a little email ;). I don't claim that they are the best
or anything, and if you can see kewl ways to improve them pleaser give me a
shout.
---==[Converting to 24BPP]==--------------------------------------------------
Well, this should be very easy :) Just chop off the ALPHA channel. So I'll
leave this one up to you :)
---==[Converting to 16BPP]==--------------------------------------------------
Have fun trying to come up with your own methods :) I think PTC has some
nice conversion routines, although I have yet to check them out.
;32BPP->16BPP conversion(320x200) proc conv16_ src,dest:dword pushad push edi push esi mov edi,[dest] mov esi,[src] mov ecx,64000 @conv16_loop: mov eax,[esi] and eax,00000000111110001111110011111000b shr ah,2 shr ax,3 ror eax,8 add al,ah rol eax,8 stosw add esi,4 dec ecx jnz @conv16_loop pop esi pop edi popad ret endp conv16_
---==[Converting to 8BPP(mode13h)]==------------------------------------------>
Have fun trying to come up with your own methods :) I think PTC has some
nice conversion routines, although I have yet to check them out. This
function doesn't take into account the palette. Infact, all it does is assume
you've set your palette to go from 0(black) to 255(white), and then finds the
approximate brightness of the RGB values and uses them. I know its lame :),
but I've seen other demos doing the same thing. Oh well, I'm sure I'll write
a color palette version very soon, as I've only adopted this 32BPP internal
mode about 2 weeks ago.
;32BPP->8BPP conversion(320x200) proc conv8_ src,dest:dword pushad push edi push esi mov edi,[dest] mov esi,[src] mov ecx,64000 @conv8_loop: mov ebx,[esi] mov eax,ebx rol ebx,16 and ebx,255 and eax,255 add ax,bx ror ebx,16 shr ebx,8 and ebx,255 add ax,bx shr eax,2 stosb add esi,4 dec ecx jnz @conv8_loop pop esi pop edi popad ret endp conv8_
---==[Converting to TextMode]==-----------------------------------------------
Hmmm, this was hard :) hehe, its amazing that with these graphics modes, it
seems to get easier with the more colors you can have. I mean 32BPP is dead
easy to code, 16BPP is harder, and textmode is quite a mission :) This is a
VERY simple hack, and if you can make a better one, please let me know all
about it. This one just writes character #176, #177, #178, #219 to the
screen depending on the brightness of the RGB value. And it also selects the
color(0-15) base on the "brightness" of the RGB value. So it assumes that
your palette goes from dark-->bright. Unfortunately I haven't made it do
funky things like realtime change the palette or search for the best color.
I'll probably do this soon. This is basically just a test:
;32BPP->Textmode conversion(80x50) proc convtxt_ src,dest:dword pushad push edi push esi mov edi,[dest] mov esi,[src] mov edx,50 @convtxt_loopy: mov ecx,80 @convtxt_loopx: mov ebx,[esi] mov eax,ebx rol ebx,16 and ebx,255 and eax,255 add ax,bx ror ebx,16 shr ebx,8 and ebx,255 add ax,bx shr eax,2 mov ah,al mov bl,0 cmp al,0 jle @asc0 cmp al,48 jge @asc0 mov bl,176 jmp @ascout @asc0: cmp al,48 jle @asc1 cmp al,96 jge @asc1 mov bl,177 jmp @ascout @asc1: cmp al,96 jle @asc2 cmp al,144 jge @asc2 mov bl,178 jmp @ascout @asc2: cmp al,144 jle @asc3 mov bl,219 jmp @ascout @asc3: @ascout: shr ah,4 mov al,bl stosw add esi,16 dec ecx jnz @convtxt_loopx add esi,3840 dec edx jnz @convtxt_loopy pop esi pop edi popad ret endp convtxt_
---==[Closing Words]==--------------------------------------------------------
Phew :) I really hope this helps some people out there, in some way or
another. Please send me any thoughts/ideas/improvements on this topic, I'd
really like to hear/see them. The scene is wonderful, long live the scene.
When I die I want to go to a scene heaven ;)
-Rawhed/Sensory Overload
-Mailto:sfeist@netactive.co.za
-Htpp://www.surf.to/demos/
-Andrew Griffiths
-South Africa
-05-07-1999