Recovering Eye space Z
During rendering of arbitrary scenes, vertices specified in
object space are transformed to eye space, and the eye space coordinates are transformed to
clip space with a
projection matrix. The resulting 4D clip space coordinates are divided by their own
w components, resulting in
normalized-device space coordinates. These normalized-device space coordinates are then transformed to
screen space by multiplying by the current
viewport transform. The transitions from clip space to screen space are handled automatically by the graphics hardware.
The first step required is to recover the original eye space Z value of f. This involves sampling a depth value from the current depth buffer. Sampling from the depth buffer is achieved as with any other texture: A particular texel is addressed by using coordinates in the range [(0, 0), (1, 1)]. The r2 package currently assumes that the size of the viewport is the same as that of the framebuffer (width, height) and that the bottom left corner of the viewport is positioned at (0, 0) in screen space. Given the assumption on the position and size of the viewport, and assuming that the screen space position of the current light volume fragment being shaded is position = (screen_x, screen_y), the texture coordinates (screen_uv_x, screen_uv_y) used to access the current depth value are given by:
Intuitively, (screen_uv_x, screen_uv_y) = (0, 0) when the current screen space position is the bottom-left corner of the screen, (screen_uv_x, screen_uv_y) = (1, 1) when the current screen space position is the top-right corner of the screen, and (screen_uv_x, screen_uv_y) = (0.5, 0.5) when the current screen space position is the exact center of the screen.
Originally, the spiritual ancestor of the
r2 package,
r1, used a standard depth buffer and so recovering the eye space Z value required a slightly different method compared to the steps required for the
logarithmic depth encoding that the
r2 package uses. For historical reasons and for completeness, the method to reconstruct an eye space Z value from a traditional screen space depth value is given in the section on
screen space depth encoding.
Recovering Eye space Z (Screen space depth encoding)
Assuming a screen space depth value screen_depth sampled from the depth buffer at (screen_uv_x, screen_uv_y), it's now necessary to transform the depth value back into normalized-device space. In OpenGL, screen space depth values are in the range [0, 1] by default, with 0 representing the near plane and 1 representing the far plane. However, in OpenGL, normalized-device space coordinates are in the range [(-1, -1, -1), (1, 1, 1)]. The transformation from screen space to normalized-device space is given by:
In order to understand how to calculate the eye space depth value from the resulting NDC Z value ndc_z = screen_depth_to_ndc screen_depth, it's necessary to understand how the normalized-device coordinates of f were derived in the first place. Given a standard 4x4 projection matrix m and an eye space position eye, clip space coordinates are calculated by Matrix4x4f.mult_v m eye. This means that the z component of the resulting clip space coordinates is given by:
Similarly, the w component of the resulting clip space coordinates is given by:
However, in the perspective and orthographic projections provided by the r2 package, Matrix4x4f.row_column m (2, 0) == 0, Matrix4x4f.row_column m (2, 1) == 0, Matrix4x4f.row_column m (3, 0) == 0, and Matrix4x4f.row_column m (3, 1) == 0. Additionally, the w component of all eye space coordinates is 1. With these assumptions, the previous definitions simplify to:
It should be noted that for perspective matrices in the r2 package, Matrix4x4f.row_column m (3, 2) == -1 and Matrix4x4f.row_column m (3, 3) == 0:
This means that the w component of the resulting clip space coordinates is equal to the negated (and therefore positive) eye space z of the original coordinates.
For orthographic projections in the r2 package, Matrix4x4f.row_column m (3, 2) == 0 and Matrix4x4f.row_column m (3, 3) == 1:
This means that the w component of the resulting clip space coordinates is always equal to 1.
As stated previously, normalized-device space coordinates are calculated by dividing a set of clip space coordinates by their own w component. So, given clip_z = ClipSpaceZSimple.clip_z_simple m eye and clip_w = ClipSpaceWSimple.clip_w_simple m eye for some arbitrary projection matrix m and eye space position eye, the normalized-device space Z coordinate is given by ndc_z = clip_z / clip_w. Rearranging the definitions of clip_z and clip_w algebraically yields an equation that takes an arbitrary projection matrix m and a normalized-device space Z value ndc_z and returns an eye space Z value:
Recovering Eye space Position
Given that the eye space Z value is known, it's now necessary to reconstruct the full eye space position surface_eye of the surface that resulted in f.
When the current projection is a perspective projection, there is conceptually a ray passing through the near clipping plane ( near) from the origin, oriented towards the eye space position ( eye) of f:
When the current projection is an orthographic projection, the ray is always perpendicular to the clipping planes and is offset by a certain amount ( q) on the X and Y axes:
Assuming ray = Vector3f.V3 ray_x ray_y 1.0, the eye space position of f is given by surface_eye = Vector3f.add3 q (Vector3f.scale ray eye_z). In the case of perspective projections, q = Vector3f.V3 0.0 0.0 0.0. The q term is sometimes referred to as the origin (because q is the origin of the view ray), but that terminology is not used here in order to avoid confusion between the ray origin and the eye space coordinate system origin. It's therefore necessary to calculate q and ray in order to reconstruct the full eye space position of the fragment. The way this is achieved in the r2 package is to calculate q and ray for each of the viewing frustum corners and then bilinearly interpolate between the calculated values during rendering based on screen_uv_x and screen_uv_y.
As stated previously, normalized-device space coordinates are in the range [(-1, -1, -1), (1, 1, 1)]. Stating each of the eight corners of the cube that defines normalized-device space as 4D homogeneous coordinates yields the following values:
Then, for the four pairs of near/far corners ((near_x0y0, far_x0y0), (near_x1y0, far_x1y0), (near_x0y1, far_x0y1), (near_x1y1, far_x1y1)), a q and ray value is calculated. The ray_and_q function describes the calculation for a given pair of near/far corners:
The function takes a matrix representing the inverse of the current projection matrix, and "unprojects" the given near and far frustum corners from normalized-device space to eye space. The desired ray value for the pair of corners is simply the vector that results from subtracting the near corner from the far corner, divided by its own z component. The desired q value is the vector that results from subtracting ray scaled by the z component of the near corner, from the near corner.
Note: The function calculates
ray in eye space, but the resulting value will have a non-negative
z component. The reason for this is that the resulting ray will be multiplied by the calculated
eye space Z value to produce an eye space position. If the
z component of
ray was negative, the resulting position would have a positive
z component.
Calculating the ray and q value for each of the pairs of corners is straightforward:
Then, by reusing the
position = (screen_uv_x, screen_uv_y) values calculated during the initial
eye space Z calculation, determining
ray and
q for the current fragment involves simply bilinearly interpolating between the precalculated values above. Bilinear interpolation between four vectors is defined as:
Finally, now that all of the required components are known, the eye space position surface_eye of f is calculated as surface_eye = Vector3f.add3 q (Vector3f.scale ray eye_z).