Friday, November 16, 2012

Writing GPU Shaders in F#

I have been working lately on a library that allows one to write shaders in F#. It uses code quotations to translate these shaders to HLSL before being executed on the GPU. Check out SharpShaders on GitHub.

Monday, November 5, 2012

Auomatically enforcing HLSL Struct Layout with PostSharp

Anyone doing managed DirectX programming will at some point have to marshal data between the CPU and the GPU. In order to do this, the struct layout needs to adhere to some rules. The rules are:
  • Fields must start on a 4 byte boundary
  • If fields cross a 16 byte boundary, they have to start on a 16 byte boundary
  • Unfortunately in the .NET world, memory is not necessarily laid out in the same order that it was declared. In order to enforce a layout that adheres to the above rules, we have to declare each field offset explicitly. Doing this manually is time consuming and error prone. For example:
    [<Struct; StructLayout(LayoutKind.Explicit, Size=48)>]
    type SceneConstants_IfYouDontHavePostSharp =
        [<FieldOffset(0)>]  val Eye             : float3
        [<FieldOffset(16)>] val Light           : float3
        [<FieldOffset(32)>] val AmbientLight    : float3
        [<FieldOffset(44)>] val LightRangeSquared  : float32
        with
        new(eye, light, ambientLight, lightRange) = { 
            Eye = eye
            Light = light
            AmbientLight = ambientLight
            LightRangeSquared = lightRange }
    

    To avoid having to write all this boilerplate code every time we create a new shader, we can turn to aspect oriented programming.

    Using the PostSharp framework, one could create an aspect that will enforce the HLSL constant packing rules at compile time. It will actually weave in the necessary code into your existing class. For instance, the ConstantPacking aspect would be used as follows:
    [<Struct;ConstantPacking>]
    type SceneConstants(eye:float3, light:float3, ambientLight:float3, lightRangeSquared:float32) =
        member m.Eye = eye
        member m.Light = light
        member m.AmbientLight = ambientLight
        member m.LightRangeSquared = lightRangeSquared
    
    Isn't that neat? By just adding the magic ConstantPacking attribute, all the packing rules are enforced! The source code for this magic class can be found here.
    There is one downside to using PostSharp as your AOP framework. It is not free. However, its uses stretch far beyond enforcing layout rules. As programmers we put up with writing repetitive boilerplate code far more often than we need to. AOP helps to keep things DRY, not only when writing shaders. Considering how powerful this tool is and how much time it can save you, it is a small investment.

    Saturday, January 23, 2010

    Prime numbers

    Erik Meijer has done this excellent series in functional programming which is available at Channel9.msdn.com
    In one of the lectures he gives this example of generating an infinite list of prime numbers using lazy evaluation. His example was in Haskell. Here is the equivalent in F#:


    let rec sieve nums = seq {
    let head = Seq.head nums
    yield head
    let tail = Seq.skip 1 nums
    let notMultipleOf n x = x % n <> 0
    yield! sieve (Seq.filter (notMultipleOf head) tail)
    }

    let rec numbersAbove n = seq { yield n
    yield! numbersAbove (n+1) }

    let primes = sieve (numbersAbove 2)

    Tuesday, January 12, 2010

    Creating an interactive game using F#


    One of the interesting things about functional programming is the notion of immutable types. Functional programmers avoid shared mutable state like the plague. In most functional languages, all values are immutable by default. A value cannot change once it is created. If you want to change a value in struct with many fields, you need to create a new struct, copy all the values and provide a new value for the field that needs to change. My initially thoughts were that this seemed incredibly inefficient. What it buys us though is that we can now modify values without worrying about another thread trying to access the same value. This all sounds great in theory but how do you write a game without having mutable state? The following example is kindof silly, but it serves to illustrate the point. How do you do the equivalent of:

    void update() =
    int rocketHeight = 0;
    while(true)
    {
    drawRocketAt(rocketHeight);
    rocketHeight++;
    }

    One of the ways of maintaining state is to pass it around in a recursive function. The following snippet has no mutable state, yet the rocket is drawn at a new position each frame.

    let rec update(rocketHeight) =
    drawRocketAt(rocketHeight)
    update(rocketHeight + 1)
    update(0)


    This is all well and good, but surely you cannot pass all the state of every object in the same loop.
    Agents to the rescue!
    An agent or Actor is a lightweight object that runs in its own thread. Communication between actors happens through message passing. In F#, the MailboxProcessor implements this functionality and is essentially an actor/agent.

    let mailbox = MailboxProcessor.Start(fun inbox ->
    let rec update(state) =
    async {
    let! msg = inbox.Receive()
    match msg with
    | ActionA ->
    let newState = OnActionA()
    return! update(newState)
    | ActionB ->
    let newState = OnActionB()
    return! update(newState)
    | Stop -> return ()
    }
    update(initialState) )
    mailbox.Post(Action)


    Agents allows us to separate concerns and avoid having to pass unrelated game state in our main recursive loop. Of course agents are also multi-threaded so chances are that some work done by agents will actually automatically happen in parallel. The mechanism for passing messages are also thread safe. Only one request can be handled at a time.

    In the past week I have written the classic game Pong using these ideas. It is hopelessly over engineered for a game as simple, but it helps to see how this might fit into a game with a bigger scope. In this game there are 3 agents. The main loop, Score and PlayerController. Since there are two player controllers, this is actually 4 different threads. The main loop is implemented as an agent that post an update message to itself for each frame. This allows the Application thread to send a Stop message to end update loop.
    The Score agent listens for collision messages. Whenever the ball hits any of the oponents' zones, the score is updated.

    type ScoreMessage =
    | Collision of Surface
    | Stop

    type Score(physics:Physics) =

    let mailbox = MailboxProcessor.Start(fun inbox ->
    let rec update(green, red) =
    async {
    let! msg = inbox.Receive()
    match msg with
    | Collision(surface) ->
    let score =
    match surface with
    | GreenZone -> (green, red+1)
    | RedZone -> (green+1, red)
    | Wall -> (green, red)
    Debug.Trace "Score: Green-%d Red-%d" (fst score) (snd score)
    return! update(score)
    | Stop -> return ()
    }
    update(0,0) )

    let OnCollision(player) =
    mailbox.Post(player)

    do physics.Event.AddHandler(fun player -> mailbox.Post(Collision(player)))

    For the moment it only prints a value to the debug output. (It turns out SlimDX Direct2D is not completely implemented yet. You can not draw to the same surface as a Direct3D renderTarget.)
    The PlayerController listens for events from the keyboard and updates the paddle positions accordingly.
    The above example used the normal way of handling events. F# also allows you to manipulate the event stream using the Observable class. One can create an event stream that listents for events, reformat them and publish this modified event stream. One can even filter out events that you are not interested in. In the Input class I map from SlimDX KeyInputEventArgs to my own KeyEvent. I only have a subset of all the available keys because I would like them to fit into a 64bit integer bitfield. I therefor need to first map it to my representation and then filter out those that I don't have a mapping for.

    module Keyboard =
    ...
    do Device.RegisterDevice(UsagePage.Generic, UsageId.Keyboard, DeviceFlags.None)
    // Listen only for key events for which we have a mapping defined
    let Input = Device.KeyboardInput
    |> Observable.map ( fun e ->
    match keyToFlag.TryFind(e.Key) with
    | Some(key) ->
    {Key = key; IsPressed = e.State = KeyState.Pressed}
    | None -> KeyEvent.None )
    |> Observable.filter
    (fun keyEvent -> keyEvent.Key <> KeyFlags.None)

    Clients of the Keyboard module can then subscribe to this event stream. The PlayerController listens for these events and sends the appropriate message to its message queue.

    type PlayerController =
    ...
    let keyboardInput = Keyboard.Input.Subscribe(fun (keyEvent:KeyEvent) ->
    if keyEvent.Pressed(left) then mailbox.Post(Left)
    elif keyEvent.Pressed(right) then mailbox.Post(Right) )

    Monday, January 4, 2010

    Tessellation




    DirectX 11 provides three new stages to the graphics pipeline that allows us to increase the complexity of objects by generating additional polygons. For this example we will look at tesselating the triangles of an icosahedron to generate a sphere. Our vertex shader for this example simply passes the vertex position to the hull shader.

    HSInOut VertShader(VSInput input )
    {
    HSInOut Out;
    Out.Position = input.Position;
    return Out;
    }

    Like the geometry shader, the hull shader takes a draw primitive (in this case a triangle) as input is called once for each control point(vertex). The hull shader has two components. A patch constant function and the shader function. The patch constant function is responsible for calculating the tesselation factors for the edges and inside of the triangle before passing it along to the tesselation unit. The main shader can optionally perform additional computations. In this example, it only passes each vertex to the tesselator.

    HSConst PatchConstantsHS( InputPatch< HSInOut, 3 > p )
    {
    HSConst output = (HSConst)0;
    output.Edges[0] = output.Edges[1] = output.Edges[2] = 4;
    output.Inside = 4;
    return output;
    }

    [domain("tri")]
    [partitioning("integer")]
    [outputtopology("triangle_cw")]
    [patchconstantfunc("PatchConstantsHS")]
    [outputcontrolpoints(3)]
    HSInOut PassThroughHS(
    InputPatch< HSInOut, 3 > inputPatch,
    uint uCPID : SV_OutputControlPointID )
    {
    HSInOut output = (HSInOut)0;
    output.Position = inputPatch[uCPID].Position;
    return output;
    }

    The real action happens in the Domain shader. Every additional vertex that is output from the tesselator is passed to the domain shader. It provides the original primitive that was tesselated, as well as the coordinates of the newly generated vertex in relation to the original. For a triangle patch, this is the barycentric coordinates of the vertex. While this sounds really special, it is really just the weights with respect to the original triangle vertices. It can easily be transformed by multiplying each weight with the original vertex coordinate as seen in the snippet below. The newly generated vertices are still in the same plane of the original triangle. In order to provide a better looking sphere the domain shader will also push out these new vertices so that they lie on the surface of our sphere before doing the transformation to screen space.

    [domain("tri")]
    PSInput DS(
    HSConst input, const OutputPatch< HSInOut, 3 > TrianglePatch,
    float3 BarycentricCoordinates : SV_DomainLocation )
    {
    PSInput output = (PSInput)0;

    // Interpolate local space position with barycentric coordinates
    float3 position = BarycentricCoordinates.x * TrianglePatch[0].Position
    + BarycentricCoordinates.y * TrianglePatch[1].Position
    + BarycentricCoordinates.z * TrianglePatch[2].Position;
    position = normalize(position);
    output.Position = mul( float4(position, 1.0), WorldViewProjection );
    output.Normal = mul( position, (float3x3)World );
    return output;
    }

    When creating the hull and domain shader it is important to specify shader model 5.0.

    type Geometry(device, shape) =
    ...
    let hullShader =
    let byteCode = ShaderBytecode.CompileFromFile(
    "Simple.hlsl",
    "PassThroughHS",
    "hs_5_0",
    ShaderFlags.None,
    EffectFlags.None)
    new HullShader(device, byteCode)

    let domainShader =
    let byteCode = ShaderBytecode.CompileFromFile(
    "Simple.hlsl",
    "DS",
    "ds_5_0",
    ShaderFlags.None,
    EffectFlags.None)
    new DomainShader(device, byteCode)

    Something extremely important. The primitive topology should be set to a patch list with 3 control points. Failing to do this will result in several hours of pulling out your hair to figure out why things don't work.

    type Geometry(device, shape) =
    ...
    member this.Prepare(transforms) =
    do deviceContext.InputAssembler.PrimitiveTopology <-
    PrimitiveTopology.PatchListWith3ControlPoint
    do deviceContext.HullShader.Set(hullShader)
    do deviceContext.DomainShader.Set(domainShader)
    do deviceContext.DomainShader.SetConstantBuffer(
    vsConstBuffer.Update(transforms),0)
    ...

    Note also that the DomainShader now requires access to the transforms shader constants.

    Saturday, January 2, 2010

    Geometry shaders for Dummies



    DirectX 11 offers 2 new programmable shader stages in the graphics pipeline, the Hull Shader and the Domain Shader. Before I investigate those, I wanted to first look at Geometry shaders, which was actually introduced in DirectX 10 already. Geometry shaders allows you to treat your data at a primitive level rather than the vertex level. Where in a Vertex shader you have access to only a single vertex at a time, the geometry shaders gives you the three vertices of a triangle as input. The shader can then modify the vertices, add more triangles or even discard it entirely. For my example I will tessellate an Icosahedron. This is a convenient shape to subdivide since it results in a sphere with evenly distributed vertices. Each triangle of the original icosahedron is recursively subdivided into 4 triangles. The newly created vertices are then normalized so that the lie on the surface of a unit sphere.
    The vertex shader which would usually transform the vertices form local space to screen space, will now just pass the vertices along to the pipeline without any transformation.

    struct GSInput
    {
    float4 Position : POSITION;
    };
    GSInput VertShader(VSInput input )
    {
    GSInput Out;
    Out.Position = input.Position;
    return Out;
    }

    Note that the original normal of the vertices are not really of interest since for a unit sphere, the normal is the same as the position. We only pass on the vertex position in local space.
    SphereVertex is a helper function used by the Geometry shader that now does the vertex transformation.

    PSInput SphereVertex(float4 position)
    {
    PSInput output;
    output.Position = mul( position, WorldViewProjection );
    output.Normal = mul(position.xyz, (float3x3)World);
    return output;
    }

    Below is the algorithm for a single subdevision of an icosahedron. Here I ran into a limitation of HLSL. It turns out that recursive functions are not supported... One could write the algorithm iteratively but it is a bit more messy, and this exercise was really just a stab creating a geometry shader that does something useful. DirectX 11 actually has a hardware tesselator for this sort of thing. In the next tutorial I will create Hull and Domain shaders to do tessellation with arbitrary complexity.

    [maxvertexcount(12)]
    void GeomShader( triangle GSInput input[3], inout TriangleStream OutputStream )
    {
    float4 va = input[0].Position;
    float4 vb = input[1].Position;
    float4 vc = input[2].Position;
    float4 v1 = float4(normalize(va.xyz + vb.xyz),1);
    float4 v2 = float4(normalize(va.xyz + vc.xyz),1);
    float4 v3 = float4(normalize(vb.xyz + vc.xyz),1);
    OutputStream.Append( SphereVertex(vc) );
    OutputStream.Append( SphereVertex(v2) );
    OutputStream.Append( SphereVertex(v3) );
    OutputStream.Append( SphereVertex(v1) );
    OutputStream.Append( SphereVertex(vb) );
    OutputStream.RestartStrip();
    OutputStream.Append( SphereVertex(va) );
    OutputStream.Append( SphereVertex(v1) );
    OutputStream.Append( SphereVertex(v2) );
    OutputStream.RestartStrip();
    }

    When setting up the geometry, one creates a geometry shader in very much the same way as a vertex or pixel shader.

    type Geometry(device, shape) =
    ...
    let gsByteCode = ShaderBytecode.CompileFromFile(
    "Simple.hlsl",
    "GeomShader",
    "gs_4_0",
    ShaderFlags.None,
    EffectFlags.None)

    let geomShader = new GeometryShader(device, gsByteCode)
    ....
    member this.Prepare(transforms) =
    ....
    do deviceContext.GeometryShader.Set(geomShader)
    do deviceContext.GeometryShader.SetConstantBuffer(
    vsConstBuffer.Update(transforms),0)



    Note that the transforms constants (World/WorldViewProjection) that were previously used by the vertex shader must now be provided to the geometry shader instead.


    For interest sake, here is what the subdivision algorithm in F# looks like if it were to be done on the CPU instead.



    static member Sphere subdivisions =
    let t = (1.0f + sqrt(5.0f))/2.0f;
    let s = sqrt(1.0f + t*t);
    let icosahedronVertices = [|
    Vector3( t, 1.0f, 0.0f)/s;
    Vector3( -t, 1.0f, 0.0f)/s;
    Vector3( t,-1.0f, 0.0f)/s;
    Vector3( -t,-1.0f, 0.0f)/s;
    Vector3( 1.0f, 0.0f, t)/s;
    Vector3( 1.0f, 0.0f, -t)/s;
    Vector3(-1.0f, 0.0f, t)/s;
    Vector3(-1.0f, 0.0f, -t)/s;
    Vector3( 0.0f, t, 1.0f)/s;
    Vector3( 0.0f, -t, 1.0f)/s;
    Vector3( 0.0f, t,-1.0f)/s;
    Vector3( 0.0f, -t,-1.0f)/s;
    |]

    let icosahedronIndices = [|
    0; 8; 4;
    1;10; 7;
    2; 9;11;
    7; 3; 1;
    0; 5;10;
    3; 9; 6;
    3;11; 9;
    8; 6; 4;
    2; 4; 9;
    3; 7;11;
    4; 2; 0;
    9; 4; 6;
    2;11; 5;
    0;10; 8;
    5; 0; 2;
    10; 5; 7;
    1; 6; 8;
    1; 8;10;
    6; 1; 3;
    11; 7; 5;
    |]

    let rec subdevide(vertices:array, indices, level) =
    if level < subdivisions then
    let subdivideHelper(accVertices:array,
    accIndices,
    triangle:list) index =
    match triangle with
    | ib::ia::[] ->
    let ic = index
    let va, vb, vc = vertices.[ia], vertices.[ib], vertices.[ic]
    let v1, v2, v3 =
    Vector3.Normalize(va + vb),
    Vector3.Normalize(va+vc),
    Vector3.Normalize(vb + vc)
    let i1 = accVertices.Length
    let i2 = i1+1
    let i3 = i2+1
    (Array.append accVertices [|v1;v2;v3|],
    Array.append accIndices [|i1;i3;i2;
    ia;i1;i2;
    ic;i2;i3;
    ib;i3;i1|],
    [])
    | _ -> (accVertices,accIndices, index::triangle)

    let vertices, indices, _ =
    Array.fold subdivideHelper (vertices,[||],[]) indices
    subdevide(vertices, indices, level+1)
    else
    vertices, indices

    let vertices, indices = subdevide(icosahedronVertices, icosahedronIndices, 0)
    { Vertices = Array.map (fun vec -> Vertex(vec, vec)) vertices; Indices = indices }

    Thursday, December 31, 2009

    Indexed drawing

    I was getting annoyed with having to cast Math.PI which is double to float32 every time. F#, like C# 3.0 allows you to extend existing types by adding your own functionality. You don't have access to any of the internals, but you can add methods and properties that access the public interface. Here is my extension to System.Math:

    module Flow.TypeExtensions
    open System

    type System.Math with
    static member Pi = float32(Math.PI)
    static member PiOver2 = Math.Pi/2.0f
    static member TwoPi = Math.Pi*2.0f


    In this tutorial we are drawing a cube. Since a cube requires more vertices than a single triangle and many of them will be the same, we are going to use an index buffer to minimize the amount of vertices. We still need 4 vertices per face since the normals are unique per face. Still, 4 is better than 6. We create a new class type called Shape which will hold our vertices and indices. The helper function Shape.Box will create our cube by contructing a single face and rotating it to form each of the sides.

    type Shape = {
    Vertices : array< Vertex >
    Indices : array< int32 >
    }
    with
    //-------------------------------------------------------------------------
    static member Box =
    let vertices =
    let normal = Vector3( 0.0f, 0.0f,-1.0f)
    let face = [|
    Vector3(-1.0f, 1.0f,-1.0f);
    Vector3( 1.0f,-1.0f,-1.0f);
    Vector3(-1.0f,-1.0f,-1.0f);
    Vector3( 1.0f, 1.0f,-1.0f)
    |]

    let rotatedFace rotation =
    let vertices = Vector3.TransformCoordinate(face, ref rotation)
    let normal = Vector3.TransformCoordinate(normal, rotation)
    Array.map (fun vec -> Vertex(vec, normal)) vertices

    let aboutY angle = Matrix.RotationY(angle)
    let aboutX angle = Matrix.RotationX(angle)

    // Create the sides by rotating the front face 4 times about the Y axis
    let rec sides(angle, acc) =
    if (angle < Math.TwoPi) then
    sides(angle+Math.PiOver2, Array.append acc (rotatedFace(aboutY angle)))
    else
    acc
    // Create the top and bottom by rotating the front face 90 and
    // -90 degrees about the X axis
    let top, bottom = rotatedFace(aboutX Math.PiOver2), rotatedFace(aboutX -Math.PiOver2)
    Array.concat ( seq { yield sides(0.0f, [||])
    yield top
    yield bottom } )

    let indices =
    // Create indices by incrementing each element of the first face by
    // the base offset of the face. This will ensure the order is consistent
    // so all the triangles are front facing.
    let nextIndices indices increment =
    Array.map (fun x -> x+increment) indices

    let frontIndices = [|0;1;2;0;3;1|]
    let increments = [|for i in [0..5] do yield i*4|]

    Array.collect (nextIndices frontIndices) increments
    { Vertices = vertices; Indices = indices }

    Our Geometry object will now have an index buffer.

    type Geometry(device, shape) =
    ...
    let indexBufferDesc = new BufferDescription(
    BindFlags = BindFlags.IndexBuffer,
    SizeInBytes = shape.Indices.Length*sizeof< uint32 >,
    Usage = ResourceUsage.Immutable)
    let indexBuffer = new SlimDX.Direct3D11.Buffer(
    device,
    new DataStream(shape.Indices, true, false),
    indexBufferDesc)

    When preparing this geometry we set the index buffer. When drawing we now have to use DrawIndexed instead of the normal Draw call.

    member this.Prepare(transforms) =
    ...
    do deviceContext.InputAssembler.SetIndexBuffer(indexBuffer, Format.R32_UInt, 0)

    member this.Draw() =
    deviceContext.DrawIndexed(shape.Indices.Length, 0, 0)


    That pretty much it for this tutorial. As usual the full source code can be found on Google Code