Coherent Blog: 2012

Tuesday, December 18, 2012

Syntactic sugar for Unity3D binding

Recently, we've updated the Unity3D version of our product. Along with various fixes and improvements, we've added a feature that allows you to bind JavaScript methods even easier than before. The feature is called "Automatic binding" and it was suggested by one of our beta testers (thanks, Tyler!).

Using the current pattern, you have to wait until Coherent UI fires the ReadyForBindings event, and register your handlers. That would look something like this:
The essence of the new feature is decorating the method in .Net that you want to bind with an attribute, specifying the name of the corresponding JavaScript function. Using the attribute simplifies the code above to the following:
Easy, right? The second argument of the attribute defines whether the JavaScript event should be treated as single or multicast. You can check out the binding sample in our new version, along with detailed explanation in the Coherent UI guide for Unity3D.

We're always striving to improve our product, so if you have an idea for a cool feature don't hesitate and drop us a line :).

Thursday, December 6, 2012

Far Cry 3 Review: UI & Gameplay

Far Cry 3, the long waited role-playing shooter game, has just been released in the US for consoles and PC. Critics describe it as “something like Skyrim but with guns” and many say that this is the best game UbiSoft has released since Assassin's Creed: Brotherhood.

We don’t know if this is true yet, however when there is so much buzz around an AAA title the team of Coherent Labs is always curious to find out more about it. Since we believed that many of you will be interested to learn more as well we decided to share with you some screenshots of Far Cry 3 introducing you to the game’s UI and gameplay.

As you can see on the picture there are different kind of troops in Far Cry 3: skulls are the normal guys, the shields are the heavies, and be careful from the lightning guys - they are fast, suicidal and probably the most dangerous NPCs. There are also 2 types of animals - herbivores and predators. With so many enemies the good news is that you are able to see people through the walls which allows you to prepare a stealth attack instead of getting killed in the melee. You’re also able to spot others on your mini map.

This beautiful HUD represents the camera that you use to zoom in and get a closer look before you decide to explore certain location. It has a nice legend explaining about the types of NPCs you might see and it also allows you to take pictures which you’re able to review later in your gallery.

This is how the map looks like in Far Cry 3. The red bits represent the enemy controlled territory which you need to conquer. You are allowed to fast travel to your camps marked on the map and, last but not least, you can see different kind of animal territories - bears, deer, pigs, etc.The map also displays cursor’s coordinates and has a legend explaining the meaning of each icon.

This is one of the RPG elements in the game allowing you to take quests for which you get small rewards and xp. The fun part is that you’re supposed to kill the target with a specific weapon in order for the mission to be successful.

In the game inventory you are allowed to carry and use different kind of potions which you prepare by yourself using ingredients that you find on the island. This is why the player in Far Cry 3 needs to pay close attention to the wildlife or he’s going to have a tough in the game.

There is the place in every camp where you are allowed to buy weapons and ammo, to customize them by changing their skin or to improve their performance.

You are also allowed to learn lots of skills in Far Cry 3, mainly divided in 3 category:

The heron - these are long range takedowns and mobility skills
The shark - assault takedowns and healing skills
Spider - stealth takedowns and survival skills

This is how the Handbook looks like. Here you can check out your progress of the game: number of missions, quests, skills learned and more useful things.

If these screenshots are not enough for you to get a feeling about the game’s UI and gameplay, you might want to check out this video review of Far Cry 3 done by the Official Playstation magazine.

Far Cry 3 is one of these games that impresses with great UI and gameplay. You will be surprised how big is the world in the game and that it has a life of its own with NPCs attacking each other on island regardless of your presence.

Tuesday, December 4, 2012

A high level shader construction syntax - Part III

This article is part of the a series of posts, the first ones can be found here:

A high level shader construction syntax - Part I
A high level shader construction syntax - Part II

Compile-time conditions

Although the proposed shader syntax minimizes compile-time code branching, it is still needed. Sometimes it is required to perform a calculation only if a specified previous step has been selected by the polymorphic decision system. Such cases should be rare.

To address the issue two new keywords are introduced: CONTEXT_IF and CONTEXT_IFNOT. The syntax for using them is:

The usage is:
CONTEXT_IF(context.some_semantic) {
...
}

When the CONTEXT_IF is parsed, the translator splats the code in the curly braces if (or not if) the semantic is currently present in the context.

For instance in the snippet above if specular_color has not been calculated (i.e. GetSpecularColor()) was None, then the semantic will be missing from the context. In that case the CONTEXT_IFNOT will be true and the code will be added to the final shader. Keep in mind that all those conditions are evaluated at shader translation time and at the moment the translator reaches the line with the context conditional. If the tested semantic appears after the test in code, it won't influence it.

The implementation supports condition nesting as well as polymorphics in conditions. However conditions can currently appear only in the main body of the shader and not in expanded polymorphics.

Implementation

A sample implementation of a translator that implements the enhanced syntax can be found on GitHub:
http://github.com/stoyannk/ShaderTranslator.
The sample can easily be modified to become a stand-alone library.

The major components are the ShaderTranslationUniverse and the ShaderTranslator. The universe is a holder for all the atoms, combinators and polymorphics. It is highly likely that all shaders in a product share a library of those components but it might not always be the case.

The ShaderTranslator performs the translation itself given a universe and the initial source code. It outputs valid HLSL SM4 code.

The translation process itself is heavily based on regular expression and performs a lot of string operations. To speed things up and reduce memory fragmentation, the temporary strings used during the translation process use a linear scratch allocator. It takes a big chunk of memory and always returns new blocks from it without ever freeing it. At the end of the translation the whole memory region is freed all at once. This scratch allocator is accessed as a thread-local variable. I wouldn't recommend however using the translator at runtime - it is best suited as a step in the build process of the final product.

I wasn't too pedantic at fixing allocations during the translation process so some containers used still access the default allocator. It would be trivial to change those too.
The main program comes with two sample shaders used for testing.

Monday, December 3, 2012

What can Unity 4 and DirectX 11 do together?

With the new version Unity 4 introduces support for DirectX 11 which can really take your graphics to the next level. The company released a video explaining a bit about the new features, however we believe it was not enough for the game developers to learn even slightly for all the possibilities.

Using DirectX 11 with Unity 4 means taking full advantage of features like:

Shader model 5

The main purpose of this feature is to solve a common problem in current game engines: the upsurge in the number of shaders due to the large number of permutations. In other words, for each kind of material and light the game developers must include a shader in order to handle all cases.

DirectX 11 offers an elegant solution through dynamic shader linkage.

Tessellation

Tessellation is one of the biggest features around DirectX 11 and Unity 4. It is simply breaking down polygons into finer pieces which brings profound improvements to 3D graphics. For example, game developers can cut a square across its diagonal to make 2 triangles and use them to depict new information. You can see the difference in the images below.

Without Tessellation

With Tessellation

Compute shaders

Compute shaders provide high-speed general purpose computing and takes advantage of the large numbers of parallel processors on GPU. They provide with memory sharing and thread synchronization features to allow more effective parallel programming methods.

A great example is this demo made by Nvidia showing a real-time simulated ocean under twilight lighting condition.

Unity 4, DirectX 11 and UI

As you may already know, Coherent UI have been recently integrated with Unity. Coherent UI natively supports DirectX 11 rendering and we are really happy that in Unity 4 it is supported too so users will be able to use its features to create better game UI. Inspired by this we integrated one of the Unity 4 samples with Coherent UI components and we made this demo to show you what Coherent UI, Unity 4 and DirectX 11 can do together.

As you can see there is a HUD on the main camera and two more Coherent UI Views on the paintings. You are allowed to scroll and select the paintings, interact with the UI and even play a game within the game. At the same time the performance has improved up to 50% compared to using Unity 3.5 version and DirectX 9.

If you’re interested in using Coherent UI with Unity 4, you’re welcomed to request a public beta of our integration with the game engine from our download page.

Thursday, November 22, 2012

Meet Coherent Labs on Game Connection in Paris

There are only 6 days left till the start of Game Connection 2012 in Paris. And we can hardly wait because next week our team will be in the French capital among with more than 500 game studios, publishers, distributors and investors from all around the world.

If you're not familiar with Game Connection, this is one of the best game conferences and networking events in Europe. The reason for this is that you are actually allowed to meet and get to know the key players in the gaming industry through something similar to "speed dating". The end goal, of course, is to find the best possible partners for your business and this game conference might be the perfect place to do so.

The Game Connection meeting app allows you to schedule up to 10 meetings per day, well in advance of the event by sending and approving requests to/from other attendants. In case you have already decided to attend the event you might pass through our exhibitor table T350. Even better is to request a meeting in advance because we might be quite busy during these 4 days.

We will keep you posted on Game Connection 2012 and we will share with you our impressions of the event.

Wednesday, November 21, 2012

Unity3D integration with Coherent UI - Public Beta

Coherent UI is really happy to announce today the official public beta of our integration with the Unity3D game engine!

In other words, with Coherent UI you can build your game user interface in modern HTML5 & CSS3 and trivially integrate it in your Unity3D game.

If you were reading our blog, you might have known already that this was coming. Early this month in Coherent UI in Unity3D - First look we released a short teaser and later on we talked in another post about the upcoming official integration with the game engine.

For the curious ones we have prepared a video tutorial explaining how Coherent UI works inside the editor:

If this gets your curiosity and you'd like to get your hands on the public beta, don't hesitate to request your version on our download page.

We will highly appreciate your feedback about our integration with Unity3D. Feel free to contact us with any problem you might encounter or suggestions you have at 'info at coherent-labs dot com'.

Tuesday, November 13, 2012

Coherent UI in the Unity3D editor - Introduction

We would like to share a video showing the first version of the Coherent UI integration in the Unity3D editor. In this video we show how to:

Import the Coherent UI package
Create an in-game object with a web page on it
Interact with a web page in the game
Add a HUD powered by Coherent UI
Build the game

We plan to release a Beta version of the product in a couple of weeks tops so stay tuned! For any suggestions or thoughts about the Unity integration please leave a comment or visit this thread.

How it works

The integration is very powerful yet very simple to use and requires little or no programming at all as we have wrapped everything in components with editor-accessible properties. However, we also expose to script all the features available in the .NET and C++ version of the library so more custom behavior can also be achieved.

What happens in the video:

1. We load an empty scene and add a floor, a light and a cube on which we'll project what in Coherent UI terms we call a 'View'. A view is something that Coherent UI renders - that could be a HUD element, a projected web page, an animation - anything wrapped in an HTML page.

Simple scene

2. We add a character controller so that we can move around

3.We import the Coherent UI package. The component we are interested in this tutorial is 'CoherentUIView'. We drag it on one of the faces of the cube. All the various properties of the View are editable in the Inspector. By default it will load google.com. Let's hit 'Play'.

A Coherent UI View on an object

4. We can see the web page on the game object in Unity!

5. We change the web page and the resolution.

6. Now we have a nicer page in a better resolution.

An interactable web-page projected on a game object

7. The next thing is to make the page interactable. To do this we add a Mesh Collider component to the mesh. Then a UserScipt component that just handles the raycast and mouse button translation to the Coherent UI View. We also add some code to the camera to stop it from receiving when we hit 'L', otherwise it's very hard to click on anything with the mouse-look on. That's it - the View is now fully interactable.

8. Now for the HUD - we just drag the 'CoherentUIView' component on our Main Camera. We have made the HUD resources and copy them in our project and set the View's Page property to coui://TestPages/demo/demo.html.

'coui' is a special protocol we use to signal that the resource is local and subject to loading through the file handlers supplied by the application. In this way you can use a custom resource manager as well as for instance encrypt your UI data. The HUD looks pixelated because the resolution we set does not coincide with the one of the Editor pane. It will however be OK in the game when we build it. In your games the resolution will always be tied to the actual resolution of the Camera (the back-buffer).

A sample HUD made rendered through Coherent UI

9. The Coherent UI integration handles all cases and detects if the component is attached to and object in the world or a camera. It is compatible with post effects. We can add an effect on the camera. By default that effect will be applied to the View also. However this might not be desirable so by clicking 'Apply After Post-Effects' you can disable them on the View.

Post effects can be applied or skipped on Coherent UI Views

10. We now just have to build the game. Coherent UI resources will automatically be copied and made available at runtime.

The built game

All Coherent UI rendering happens in the native C++ plugin so it's impact on the performance of your game should be minimal.

Thursday, November 8, 2012

Porting Brackets to a new platform

Brackets is a code editor for HTML, CSS and JavaScript that is built in HTML, CSS and JavaScript. That makes it a fully featured desktop application written using Web technologies. Brackets embeds a browser to show and run the entire editor and extends it with additional functionality used by JavaScript. There are more and more applications using this kind of mixed platform - HTML/JavaScript becomes first class User Interface (UI) (and not only) for Window 8 applications, other tools from Adobe such as Edge Animate, Icenium - a new cloud IDE, recently announced by Telerik. Sencha Animator also is an HTML5 application, running inside QT WebKit. Even the UI of most browsers is written in HTML.
This approach has several advantages:

fully cross-platform UI - as long as the browser is running on a platform, the UI is going to be running on that platform too
full UI customization using CSS lots of available application frameworks MVC, MVVM - Backbone.js, knockout.js, you name it
lots of UI libraries - Kendo UI, jQuery UI, mochaui
easy development of the UI using Chrome Dev Tools, Firebug, etc.
Live reload of the UI, without restarting the whole application!
native access to filesystem and everything without sandboxing
use existing native APIs
native performance where necessary
only single browser to support - this is a huge relief for the HTML/JavaScript developers

Given the advantages, I am sure that more and more mixed native and HTML applications are going to appear.

Brackets Architecture

Brackets consists of two applications running together - an HTML/JavaScript application and a native C++ shell that runs the HTML application. The native shell consists of two processes - one running the HTML application inside a browser, and one running the heavy native part like filesystem operations and number crunching. The communication between the two processes is asynchronous, using JavaScript callbacks and request / response for the native code.

The shell extends the standard HTML DOM with the following functions:
Open file or directory using the OS dialog
Read file
Write file
Create directory
Rename file or directory
Get file last modification time
List directory contents
Open and close browser for Live Preview
Open extensions directory in OS file browser
Open developer tools for brackets itself
Get the system default language
Time since the application start
Get the application support directory

Brackets Shell implements the asynchronous communication between the HTML render process and the browser in a very simplistic way: JavaScript executes a function appshell.fs.readdir(path, callback), the render process stores the callback in a mapping from a request id to callback and calls the native process with the name of the function. When the native process is ready it sends back any result of the call together with the request id. The HTML process finds the callback by the request id and executes it with any result.

This architecture is the same as for any good GUI application - the UI never executes any expensive functions and is always responsive and all data manipulation is in a separate thread.

Running Brackets in Coherent UI

Making Brackets run inside Coherent UI is really easy. We start by creating a view that loads www/index.html relatively to the executable. To support Coherent UI we have to include some JavaScript files in the index.html of Brackets

These scripts are Coherent UI dependencies, Coherent UI itself and the abstraction layer between all JavaScript code of Brackets and Coherent UI. Then we have to register our native callbacks for the asynchronous calls:

Brackets native functions always return an error code as first argument to the JavaScript callback. This mechanism can be implemented in Coherent UI, but then the callbacks always have to do two things - handle the correct result and handle the error, which is kind of annoying. engine.Call might take two callbacks - one for the successful result and one for error. Therefore we have to wrap the normal callback in an object with separate handlers for success and error.
gist: javascript callback wrapper
All that is left now is to wrap the callbacks and use engine.Call instead of native function.

and to write the native function

Handling synchronous calls

Brackets has and some synchronous methods that return to JavaScript immediately. These methods are:

Get the application support directory
Get the system default language
Time since the application start

Coherent UI does not support providing synchronous JavaScript functions by design, so we will have to work around that.

The application support directory remains constant through a single run of Brackets, so we can set it once and for all during application initialization:

Getting the system default language might be implemented in the same way. The last method left is time since the application start and is used only in Show Performance Data menu. This method might be implemented entirely in JavaScript, assuming Brackets is not going to be reloaded during the performance test run.

Porting Bracket to Linux

Since Coherent UI already runs on Linux and we have implemented most of the native functions using the cross-platform boost::filesystem library, all we have to do to get is showing an open file dialog, creating and closing a chrome instance opening an URL and folder in the default OS HTML browser and file manager.

We use the GtkFileChooserDialog and the xdg-open tool. Unfortunately, the Live Preview doesn't work under Linux. Live Preview in Brackets creates a new Google Chrome instance with enabled remote debugger, attaches to the debugger using XmlHttpRequest and WebSockets and controls the instance via the debugger. What happens on Linux is that one of the XmlHttpRequests fails with "DOM Exception" and the debugger is unable to attach to the instance.

Another Linux related issue is that I couldn't find a way to close a Google Chrome tab on Linux gracefully, so when the developer tools are closed you get the "Ow, Snap" page. In a future version we will stop using Google Chrome for Live Preview and for showing the developer tools, which will fix this problem.

Here is a short video of Brackets running on Linux:

Get a prebuild package or get Coherent UI and start hacking!

Tuesday, November 6, 2012

Coherent UI in Unity3D - First look

We want to share with you the first video of the integration of Coherent UI with Unity3D:

Coherent UI already support HUD-type views as well as views mapped to objects with input. The integration is designed to be very user-friendly yet powerful. The demo shows the awesome CSS3 periodic table by Ricardo Cabello aka Mr. Doob and in Unity3D it looks even better.

Update: Check out the introduction to the integration of Coherent UI in the Unity3D editor.

Monday, November 5, 2012

Passing a struct from C# to C++ gone wrong

Have you ever tried calling a C# method returning a structure from C++? This is not a common use of interoperability so you probably haven't, but let me tell you a story about shooting yourself in the foot.

Imagine the following scenario: you have a C++ library that exports an interface of callback methods that will be called by the library during the lifetime of the application. One of the interface's methods returns a struct (let's call them GetStruct() and SimpleStruct, respectively). So far so good. Now you want to port your library for .NET by making a wrapper of the C++ library. You make a managed counterpart of the C++ interface and SimpleStruct(with all the marshaling needed, if any) and you're done. Except that it doesn't (always) work.

Here's an example of a case when it doesn't. Let this be our C++ library (the interface part I was talking about is omitted for brevity)

Nothing special about this code, except some unnecessary typedefs and weird processing of the returned value in FireCallback but I'll get to that in a second. This will be the C# part of the program:

Again,some standard interop use. If you compile the library and executable you'd expect a 3 written in the console, but instead you get this:

Wait, what, stack corruption? Everything's fine before the call of g_MyCallback(x, y) but it somehow corrupts the stack. If you add a breakpoint in MakeResult in the C# code you'll notice something interesting.

The value of x is something funny and y is 1 instead of 2. It seems like the parameters are offset by one. And yet all the other methods of our imaginary interface returning primitive types work? This calls for some disassembly. Let's see what happens in g_MyCallback(x, y).

5C6C13B7 8B F4                mov         esi,esp
5C6C13B9 8B 45 0C             mov         eax,dword ptr [y]
5C6C13BC 50                   push        eax
5C6C13BD 8B 4D 08             mov         ecx,dword ptr [x]
5C6C13C0 51                   push        ecx
5C6C13C1 8D 95 20 FF FF FF    lea         edx,[ebp-0E0h]
5C6C13C7 52                   push        edx
5C6C13C8 FF 15 30 71 6C 5C    call        dword ptr [g_MyCallback (5C6C7130h)]

Ok, we push the x and y parameters and then push something else. The theory for offsetting the parameters by one seems correct. But why is the compiler doing this? Well, our method returns a struct by value and copying it isn't very effective, so the (Named) Return Value Optimization kicks in (it's applied even when compiling with /Od). In short, the last pushed parameter is the address where the returned value will be stored and no copying will occur. The C++ compiler is aware of this fact and works its magic. When we cross the language boundary to C#, however, the stack is broken. You'd expect that the CLR would know these things, and it does, but we hit a corner case. The rules for function return values can be found here. More specifically:

POD return values 32 bits or smaller will be returned in the EAX register.
POD return values 33-64 bits in size will be returned via the EAX:EDX registers.
Non-POD return values or values larger than 64-bits, the calling code will allocate space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.

(The bullets are points 12,13 and 14)

The C# compiler simply couldn't know if the C++ structure is a POD or not so it applies the rules for non-PODs and it doesn't expect the hidden parameter. With the mystery unveiled, we have the following options for making our scenario work:

Make the structure a POD. In the example we can do this by removing the constructor. It's the only thing breaking the POD-ness.
Change the signature of the callback in the C++ code so it returns an integral type of the same size. In other words, change "typedef SimpleStruct ReturnType" to "typedef int ReturnType". This way the compiler won't emit code for RVO. If you have a 64-bit structure, you can use long long.
Instead of return value, make the structure an output parameter.
Add bogus fields in the structure to make it larger than 64-bits.

The last option is the least desirable one and I added it for completeness. Since having a constructor is useful in some cases, I opted for the signature change in our project which lets us keep the non-POD parts and is hidden from the user. It's not the prettiest solution and you have to keep it in mind if you ever change the size of the structure but it works :).

Note that for x64 builds the stack won't be corrupted (when compiling with Visual Studio) because the first four integral or pointer parameters are saved in the RCX, RDX, R8 and R9 registers. In the example function we only have 2 parameters, so the hidden RVO parameter will go to a register (if the function had 4 or more arguments, then maybe we will corrupt the stack, if the compiler decides to push the additional arguments and not preallocate memory by modifying the stack pointer at the beginning of the function). The return value will be wrong though, because the compiler will generate code that interprets the returned value in RAX as an address, and not a value, so it will read the memory at that address. This can be fixed using the same solutions as the ones for a 32-bit build.

Friday, November 2, 2012

Client application multi-threaded rendering support

Our on-going integration effort in Unity3D prompted us to accelerate a feature we've been planning for a long time but didn't have the time to finish until now - support for client applications with a multi-threaded rendering architecture.

Although it is even now possible to incorporate Coherent UI in an application with multi-threaded rendering, it is inherently difficult because the rendering-related events must be performed in the thread that updates the system.

For the sake of simplicity I'll call the thread that performs the logic in the client app the 'update' thread and the one that renders the 'rendering' thread.

There were two major challenges we needed to beat in order to support a separate 'rendering' thread in the client - rendering resource management and draw callbacks.

When Coherent UI needs a rendering resource - usually a texture, it calls the appropriate callback provided by the user. This happens when a new View has to be created or a View gets re-sized. The API expected the result of the operation to be immediately available.

We changed it so that it now uses a 'SurfaceResponse' object that must be signaled with the result of the operation. This signaling can happen at any later time, so the resource could be created in a separate thread. This is analogous to the resource requests (usually file-reads or file-writes) we already support in the same manner.

All new surfaces are fetched while calling 'UISystem::FetchSurfaces' for buffered Views and 'View::FetchSurface' for on-demand Views. This results in calls to the 'ViewListener::OnDraw' callback provided by the user. Usually a copy is made of the received surface for rendering and up until now those methods had to be called in the 'update' thread. Now it is perfectly safe to call them from your 'rendering' thread. This not only allows for easy integration with multi-threaded rendering pipelines but could be used as a performance optimization, as it voids the need to make an eventual copy of the surface to be used later for rendering.

Note that the 'ViewListener::DestroySurface' could now be called from both the 'update' and the 'rendering' thread but Coherent UI has already relinquished any ownership on the surface when it calls the method so it's trivial to dispose it even when the callback happens in the 'update' thread.

The API changes will be available in the next version of Coherent UI.

Thursday, November 1, 2012

Announcing Coherent UI for .Net

We are proud to announce the official release of Coherent UI for .Net.

With Coherent UI game developers and UI artists can use standard modern HTML5, CSS3, and JavaScript to create user interface and interaction for their XNA, SlimDX or SharpDX games. Features like secure micro-transactions and in-game store, social networks integration are easy to implement using the full in-game browser that Coherent UI provides. In addition to incredible HUDs for XNA and SlimDX based games, Coherent UI gives you fully integrated browser controls for Windows Forms, WPF and Gtk#, so that you can integrate your desktop application with any social network, show YouTube videos or access a web service.
The major highlights are:

WinForms browser control - full HTML5 and CSS3 support with 3D transformations, HTML5 video and Flash support

YouTube running in a Windows Forms application

WPF browser control - integrated with XAML, without the complexity of using the WebBrowser control from Windows Forms

Editing the WPF browser control XAML in Visual Studio design mode
Gtk# browser control - create truly cross-platform managed applications with embedded browser
integrates with SlimDX and XNA
connecting arbitrary .Net delegates to JavaScript events
exposing arbitrary .Net types to JavaScript
supports both Windows and Linux via Mono

For complete list of Coherent UI features visit our website.

Our next milestone is fully integrating Coherent UI with Unity and now is the time to share your thoughts and ideas about Coherent UI for Unity!

Wednesday, October 24, 2012

CryEngine 3 Integration

This week we're on a multimedia frenzy and we're presenting another video of Coherent UI :).

This one is about our experimental integration in CryEngine 3. It's experimental because our integration is with the free SDK, which doesn't expose enough resources for maximal performance (e.g. the DirectX device is not easily accessible). We wanted to have it as clean as possible (no hacky stuff!) to understand the problems one might have when trying to integrate Coherent UI in an existing engine. While this led to having sub-optimal performance (technical details below the video), it was still very good and we believe it's worth showing.

The video demonstrates some exciting features of Coherent UI:

Displaying any website on any surface
Support for HTML5/CSS3, SSL
Social integration with Facebook, Twitter, Google+ (well, this isn't exactly demonstrated, but we have it :))
JavaScript binding (for engine variables and methods)
Live editing/debugging the interface

Without further ado (and I realize it's been quite an ado :)) I present you Coherent UI in CryEngine3!

Coherent UI in CryEngine 3

Coherent UI in CryEngine 3 (short version of the above)

Developer tidbits

Here's the story of the hurdles we had to overcome. Let me start with this first, the lack of access to the DirectX device was extremely annoying. So, now that we've got that clear, we just started our integration... and immediately ran into a problem.

There was no easy way to create an empty texture with specified size. Let alone create a shared texture for Windows Vista/7. This presented the first small inefficiency in the integration, since we had to use shared memory as image transport mechanism from Coherent UI to CryEngine 3, which involves some memory copying (and using shared textures doesn't). Even access to the device wouldn't help that much here since we'd like to have a valid engine handle for the texture, which means that we can't bypass it. After some experimenting, we settled on creating a dummy material with the editor, assigning a placeholder diffuse texture and getting its ID with the following code:

Having the ID, we can update the texture at runtime using IRenderer::UpdateTextureInVideoMemory. This approach comes with its own set of problems, however. An obvious one is that you need unique dummy material and diffuse texture for each Coherent UI View, which is annoying. Another problem is that this texture is not resizable, so we had an array of textures with common sizes that were the only ones allowed when resizing a View. The least obvious problem was that if the material's texture had a mip-chain, IRenderer::UpdateTextureInVideoMemory did not automatically generate the lower mip levels which resulted in some strange results, because of the trilinear filtering. It didn't perform any kind of stretching either, and that's why we only allowed preset View resolutions. You can see the mips problem here:

Problematic trilinear texture filtering

The placeholder texture

It took some time for figuring out since, at first, we didn't have fancy placeholder textures, but only solid color ones. The solution was to simply assign a texture that had only one surface (i.e. no mips). This presented another small inefficiency.

Ok, we have a texture now, we can update it and all, but how do we draw it on top of everything so it acts as a UI? After some digging in the SDK, we found that the CBitmapUI(and more precisely, IGameFramwork's IUIDraw) class should be able to solve this, having various methods for drawing full-screen quads. The color and alpha channel weights were messed up, however, so we had to call IUIDraw::DrawImage beforehand, which had the weights as parameters, so we could reset them to 1.0. We just drew a dummy image outside the viewport to reset these values, having yet another small inefficiency.

Moving on, to the biggest inefficiency of all - Coherent UI provides color values with premultiplied alpha. This means that transparency is already taken into account. When drawing the fullscreen quad, the blending modes in CryEngine are set to SourceAlpha/1-SourceAlpha for the source and destination colors, respectively, meaning that the source alpha will be taken into account again. What we had to is "post-divide" the alpha value, so when DirectX multiplies is we get the correct result. We had to do this for each pixel, involving both bitwise and floating point operations - imagine the slowdown for doing that on a 1280x720 or even 1920x1080 image. If we had device access, all that would be fixed with a single call for the blend mode but, alas, we don't. Also, if we used DirectX11 renderer, we'd have to do another pass on the pixels to swap their red and blue channels, because the component ordering has been changed since DirectX 10!

Next on the list was input forwarding - we wanted to add means for stopping player input(so we don't walk or lean or anything while typing) and redirecting it to Coherent UI, so we could interact with the Views. This wasn't really a problem but it was rather tedious - we had to register our own IInputEventListener that forwards input events to the focused View, if any. The tedious part was creating gigantic mappings for CryEngine 3 to Coherent UI event conversion. Stopping player input when interacting with a View was easy, too - we just had to disable the "player" action map using the IActionMapManager. We also needed a free cursor while ingame, so you can move your mouse when browsing, which was just a matter of calling the Windows API ShowCursor.

The final problem was actually getting the texture coordinates of the projected mouse position onto the surface below. I tried using the physics system which provided some sort of raycasts that I got working, but i couldn't do a refined trace on the actual geometry nor obtain it to do the tracing myself. And even if I managed to do that, I couldn't find any way to get the texture coordinates using the free CryEngine 3 SDK. That's why I just exported the interesting objects to .obj files using the CryEngine Editor, put the geometry into a KD-Tree and did the raycasting myself after all. For correct results, first we'd have to trace using the physics system so we know that no object is obstructing the View. Then, we trace in the KD-Tree and get the texture coordinates, which can be translated to View coordinates.

On the upside, there are some things that just worked - the JavaScript binding, debug highlights... pretty much anything that didn't rely on the CryEngine 3 renderer :).

In conclusion, it worked pretty well, although if Crytek gave us access to the rendering device we could have been much more efficient in the integration, but then again, we used the free version so that's what we get. I was thinking of ways to get the device, like scanning the memory around gEnv->pRenderer for IUnknowns (by memcmping the first 3 virtual table entries) and then querying the interface for a D3D device, or just making a proxy DLL that exports the same functions as d3d9/11.dll and installing hooks on the relevant calls, but I don't have time for such fun now.

Now that we've seen how far can we go using the free CryEngine 3 SDK, next on the agenda is full Unity 3D integration (we have device access there!). Be on the lookout for it next month!

Monday, October 22, 2012

Introducing on-demand views in Coherent UI

Coherent UI is designed as a multi-process multi-threaded module to allow leveraging on modern processors and GPUs. Up until now it supported what we call 'buffered' views. All UI rendering is performed in an apposite rendering process that allows sand-boxing the interface's operations, hence all commands are executed asynchronously.

This kind of views allow for perfectly smooth animations and user experience and are the natural choice for dynamic UIs and in-game browser views. However, if you need to have interface elements correlate per-frame with in-game entities, buffered views might not be suitable.

Take for instance enemy players in an MMO - their nameplates must always be perfectly snapped in every frame over their heads. The same applies to RTS games - health indicators must be glued on the units and never lag behind.

On-demand views

Coherent UI is now the only HTML5-based solution that solves all these problems. We have created what we call 'on-demand' views. They allow exact synchronization between the game frame and the out-of-process UI rendering without sacrificing any performance. Everything is still asynchronous but we make strong guarantees on the content of frames you receive at any point.
With on-demand views the game can explicitly request the rendering of UI frames and is guaranteed that all events prior to the request will be executed in that frame.

The typical frame of a game that uses on-demand views looks like this:

update frame (move players, AI, etc.)
trigger UI events (i.e. set the new nameplates positions, player health, etc.)
request UI frame
draw game frame
fetch UI frame
compose UI on the game frame

This flow ensures that the game is in perfect sync with the UI and while the game renders it's frame the UI gets simultaneously drawn in Coherent UI's internals.
This video shows the new view in action. Please, do not mind the programmer art.

As you can see buffered views might introduce some delay that is noticeable if interface elements are related to in-game events as the position of the units. On-demand views remain always in-sync.

Buffered views will remain part of Coherent UI as they are very easy to use and should be the default choice when no frame-perfect visual synchronization is required between the game and the UI. For instance if have an FPS game and the UI only shows mini-map, player health and ammo you should probably use buffered views as no delay will ever be noticeable on the elements you are showing. The same applies to in-game browsers.

In other cases however on-demand views come to aid. They will be available for use in the next version of Coherent UI.

Friday, October 19, 2012

A high level shader construction syntax - Part II

Enhanced shader syntax

As explained in A high level shader construction syntax - Part I the proposed shader syntax is an extension over SM4 and SM5. It is simple enough to be parsed with custom regex-based code.

One of the requirements I had when designing this is that vanilla HLSL shader code going through the translator should remain unchanged.

Usually a shader is a mini-pipeline with predefined steps that only vary slightly. Let's take for instance the pixel shader that populates a GBuffer with per-pixel depth and normal. It has three distinct steps - take the depth of the pixel, take the normal of the pixel and output both. Now here comes the branching per-material, some materials might have normal maps while others might use the interpolated normals from the vertices. However the shader just has to complete these 3 steps - there is no difference how you get the normal.

Nearly all shader code can be simplified as such simple steps. So here I came up with the idea of what I called 'polymorphics'. They are placeholders for functions that perform a specific operation (i.e. fetch normal) and can be varied per-material.

The code for a simple GBuffer pixel shader could look like this:

The keyword 'pixel_shader' is required so that the translator knows the type of function it is working on. We have two declared polymorphics - MakeDepth and GetWorldNormal with the functions (called 'atoms') that can substitute them.

If the material has a normal map, after the translation process this shader looks like this:

There is much more code generated by the translator - the polymorphics have been substituted by 'atoms' i.e. function that perform the required task - "NormalFromMap" is an atom that fetches the normal vector from a map, "NormalFromInput" fetches it as an interpolated value from the vertices. If the material whose shader we want to create has no normal map we simply tell the translator to use "NormalFromInput" for the polymorphic "GetWorldNormal".

All these atoms are defined elsewhere and could form an entire library. They look like this:

There are many new keywords here.The all-caps words are called 'semantics', they are declared in an appropriate file and indicate the type of the placeholder name and a HLSL semantic name used in case they should be interpolated between shading stages or come as input in the vertex shader. Semantics are essentially variables that the shader translation system knows of.

A sample semantic file looks like this:

Of course if we just substitute parts of the code with snippets we'd be in trouble as different atoms require different data to work with. If we use the "NormalFromMap" atom we would need a normal map, a sampler and uv coordinates. If we use "NormalFromInput" we just need a normal vector as shader input. All functions with an input - that is atoms and the vertex/pixel shader main functions, have a 'needs' clause where all semantics needed for the computation are enumerated.

The declaration/definition(they are the same) of a sample atom is as follows:

atom NORMAL_W NormalFromMap(interface context) needs NORMAL_O, UV, TBN, MAP_NORMAL, SAMPLER_POINT

'atom' is required to flag the function. Then the return semantic and the name of the atom. 'interface context' is required. Atoms are not substituted by function calls but are inlined in the shader code - to avoid name clashes with vanilla code in the shader that is not dependent upon the translation system all computed semantics (variables) are put in a special structure called 'context'. In the atom declaration the keyword interface is used for an eventual future use. Strictly speaking currently 'interface context' is not needed but makes the atom resemble a real function and reminds that all input comes from the context. After the closing brace there is an optional clause 'needs' after which all required semantics are enumerated.

Sometimes the needed semantics are straightforward to procure - for instance if a normal map is required the system should simply declare a texture variable before the shader main code. However some computations are much more convolved - like computing the TBN matrix. Here comes the third type of resources needed in the translation process - 'combinators'.

When the translator encounters needed semantics it first checks if they are not already computed before and are not in the context (I remind you that all data is saved in the context). If it's a new semantic it checks all combinators for one that can calculate it. Combinators as atoms are functions - their declarations are almost the same as the ones of atoms:

The only difference is the keyword 'combinator' instead of 'atom'. They encapsulate code to compute a complicated semantic from more simple ones.
If no combinatoris found for a needed semantic it is assumed that it comes as an interpolant or vertex shader input. Needed semantic searches are always conducted so combinators can depend on other combinators.

To recap, the building blocks of the shader translation process are:

semantics
atoms
combinators

While it might seem complicated at first, the system simplifies the shader authoring a lot. The shaders themselves become much more readable with no branches in their logic per-material type - so no #ifdef. An atom and combinatorlibrary is trivial to build after writing some shaders - later on operations get reused. The translation process guarantees that only needed data is computed, interpolated or required as vertex input. The 'context' structure used to hold the data incurs no performance penalty as it is easily handled by the HLSL compiler. For convenience expanded atoms and combinators are flagged with comments in the outputted HLSL code and enclosed in scopes to avoid name clashes between local variables.

In the next post I'll explain some compile-time conditions supported by the translator as well as how the translation process works.

Monday, October 15, 2012

A twist on PImpl

PImpl is a well known pattern for reducing dependencies in a C++ project. The classic implementation is:

It has two drawbacks:

inhibits function inlining
has an extra heap allocation and pointer chase

The extra heap allocation leads to whole new set of drawbacks - creating an instance is more expensive, fragments the heap memory and the address space, has an extra pointer chase and reduces cache-locality.

This allocation can be avoided by a simple trade-off with the PImpl idiom. Why allocating the HTTPServerImpl instance, instead of storing it in the facade object? This is because C++ requires to see the declaration of HTTPServerImpl to allow it to be stored by value. But we can store a C++ object in every memory chunk large enough to hold the its data and respects its alignment requirements. So instead of storing HTTPServerImpl pointer in the facade, we can store a memory chunk that is interpreted as an instance of HTTPServerImpl.This concept can be easily generalized in an reusable template:

And the HTTPServer becomes:
This is definitely not a new technique and it is declared "deplorable" in GotW #28. It has its drawbacks, but I consider some of them acceptable trade-offs. What is more:

The alignment problems are mitigated by C++11 support for alignment.
Writing operator= is not harder than writing it in general
The extra memory consumption is acceptable for small number of instances, given the better cache coherency.

So, does this technique really eliminates the extra pointer chase?

The classical implementation looks like:
And the "twisted" one:

Seems like it does.

Of course, this technique breaks the PImpl idiom and might be considered a hack. Every time the HTTPServerImpl grows beyond the hard-coded size or its alignment requirements change, we have to change the definition of the facade and recompile all the source files depending on the HTTPServer.h, but given the advantages, this is an acceptable trade-off for many situations.

Literature:

John Lakos; Large-Scale C++ Software Design; Addison-Wesley Longman, 1996

Herb Sutter; Exceptional C++: 47 Engineering Puzzles, Programming Problems, and Solutions; Addison-Wesley Longman, 2000

Thursday, October 11, 2012

A high level shader construction syntax - Part I

Shader construction

A challenge in modern graphics programming is the management of complicated shaders. The huge amount of materials, lights and assorted conditions lead to a combinatorial explosion in shader code-paths.

There are many ways to cope with this problem and a lot of techniques have been developed.

Some engines like Unreal have taken the way lead by 3D modelling applications and allow designers to 'compose' shaders from pre-created nodes that they link in shade trees. An extensive description of the technique can be found in the paper "Abstract Shade Trees" by McGuire et al.. This way however the "Material editor" of the application usually has to be some sort of tree editor. Shaders generated this way might have performance issues if the designer didn't pay attention but of course they are the ones that give major freedom to that said artist.

Another technique is building shaders on-the-fly from C++ code as shown in "Shader Metaprogramming" by McCool et al.. I've never tried such a shader definition although I find it very compelling due mostly to it's technical implementation. You'd have to rebuild and relink C++ code on the fly to allow for interactive iterations when developing or debugging which is not very difficult to achieve but seems a bit awkward to me. The gains in code portability however should not be underestimated.

Über-shaders and SuperShaders usually build upon the preprocessor and enable/disable parts of the code via defines. The major drawback is that the 'main' shader in the end always becomes a giant unreadable mess of #ifdefs that is particularly unpleasant to debug.

A small variant of the SuperShader way is to use 'static const' variables injected by the native code and plain 'if's on them in the shader. All compilers I've seen are smart enough to compile-out any branching and essentially the static const variables work as preprocessor macros with the added bonus that if looks better than #ifdef and the code is a bit easier to read. On complex code all the SuperShader problems remain.

Dynamic shader linking introduced in Shader Model 5 allows to have interfaces and some sort of virtual method calls in your shaders and allows for very elegant code.

I'd like to share an idea and sample implementation of an enhanced syntax over HLSL SM4 and SM5. It is heavily influenced by the idea of dynamic linking, ASTs and "Automated Combination of Real-Time Shader Programs" with some additional features and was originally developed in order to support DirectX 10+-level hardware. Although the sample application works only on SM4 and SM5 it could relatively easily be ported to any modern shading language. On sm5 you could just use the built-in dynamic linkage feature.

In essence the program translates the 'enhanced' shader to plain HLSL. The translator works like a preprocessor so no AST is built on the code.

In the following posts I'll explain the syntax and what I tried to achieve with it as well as the implementation of the translator.

Update: A high level shader construction syntax - Part II