The following will act as lecture notes to help you review the material from lecture for the assignment. You can click on the links below to get directed to the appropriate section.

Geometric transformations are ubiquitous in computer graphics. The three main geometric transformations that we use are translation, rotation, and scaling. We find it convenient to implement these transformations as matrix operations. This requires us to work with homogeneous coordinates instead of the usual Cartesian coordinates. For a given point

p

in three dimensional space with Cartesian coordinates

(X, Y, Z)

, we express it in homogeneous coordinates

(x, y, z, w)

where

w

is known as the homogeneous component. The following shows the general mapping between Cartesian and homogeneous coordinates for three dimensions:

(X, Y, Z) \to (x, y, z, w) \{\begin{matrix} X = \frac{x}{w} \\ Y = \frac{y}{w} \\ Z = \frac{z}{w} \end{matrix}

Homogeneous coordinates allow us to represent geometric transformations in matrix form and their operation on points as simple matrix multiplications. Consider the translation operation. In Cartesian coordinates, the translation of a point

p

at position

(x, y, z)

by a vector

v = (v_{x}, v_{y}, v_{z})

shifts it to the new position

(x + v_{x}, y + v_{y}, z + v_{z})

. We can implement this operation as a matrix multiplication by expressing translation in the following matrix form:

T = [\begin{matrix} 1 & 0 & 0 & v_{x} \\ 0 & 1 & 0 & v_{y} \\ 0 & 0 & 1 & v_{z} \\ 0 & 0 & 0 & 1 \end{matrix}]

and then multiplying the translation matrix with the homogeneous representation of

p

(letting the

w

component equal 1):

[\begin{matrix} 1 & 0 & 0 & v_{x} \\ 0 & 1 & 0 & v_{y} \\ 0 & 0 & 1 & v_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ z \\ 1 \end{matrix}] = [\begin{matrix} x + v_{x} \\ y + v_{y} \\ z + v_{z} \\ 1 \end{matrix}]

As you can see, our result is the homogeneous representation of the new position with the homogeneous

w

component equal to 1.

Rotation and scaling are achieved similarly. The following matirx is the rotation matrix for rotating about an axis in the direction of the unit vector

u = (u_{x}, u_{y}, u_{z})

counterclockwise by an angle

θ

R = [\begin{matrix} u_{x}^{2} + (1 - u_{x}^{2}) cos θ & u_{x} u_{y} (1 - cos θ) - u_{z} sin θ & u_{x} u_{z} (1 - cos θ) + u_{y} sin θ & 0 \\ u_{y} u_{x} (1 - cos θ) + u_{z} sin θ & u_{y}^{2} + (1 - u_{y}^{2}) cos θ & u_{y} u_{z} (1 - cos θ) - u_{x} sin θ & 0 \\ u_{z} u_{x} (1 - cos θ) - u_{y} sin θ & u_{z} u_{y} (1 - cos θ) + u_{x} sin θ & u_{z}^{2} + (1 - u_{z}^{2}) cos θ & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]

And the following is the scaling matrix for scaling the coordinates of a point by the vector

v = (v_{x}, v_{y}, v_{z})

S = [\begin{matrix} v_{x} & 0 & 0 & 0 \\ 0 & v_{y} & 0 & 0 \\ 0 & 0 & v_{z} & 0 \\ 0 & 0 & 0 & 1 \end{matrix}]

And of course, consecutive transformations can be combined into a single matrix representation. For instance, a scaling by vector

s = (s_{x}, s_{y}, s_{z})

followed by a translation by vector

t = (t_{x}, t_{y}, t_{z})

of a point at

(x, y, z)

can be given as:

[\begin{matrix} 1 & 0 & 0 & t_{x} \\ 0 & 1 & 0 & t_{y} \\ 0 & 0 & 1 & t_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} s_{x} & 0 & 0 & 0 \\ 0 & s_{y} & 0 & 0 \\ 0 & 0 & s_{z} & 0 \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ z \\ 1 \end{matrix}] = [\begin{matrix} s_{x} & 0 & 0 & t_{x} \\ 0 & s_{y} & 0 & t_{y} \\ 0 & 0 & s_{z} & t_{z} \\ 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} x \\ y \\ z \\ 1 \end{matrix}]

Rendering in computer graphics is all about turning a mathematical model of a scene into an image that we can perceive. We refer to the coordinate space containing the points in our scene as the world space. Figure 1 shows the world space axes. Note that the coordinate axes here differ slightly from the traditional 3D axes in mathematics.

pict
Figure 1: World space coordiante axes.

In order to obtain an image of our scene, we need to specify in world space a viewing location and angle from which we look. Consider Figure 2:

pict pict
Figure 2: A wireframe of a face mesh shown from two different viewpoints.

Let the face mesh shown above be centered at the origin of our world space. Our default viewing location is at the origin and our default viewing angle causes us to look straight ahead in the direction of the negative z-axis. We can consider the image on the left as what we would see after moving or translating to some point along the positive z-axis. The image on the right would be what we see after we first rotate ourselves 90 degrees clockwise about the y-axis and then translate to some point along the negative x-axis.

We refer to our viewing location as the camera position and the viewing angle as the camera orientation - i.e. we consider ourselves to be looking at our scene through a camera. From the above example, we can see that it would be intuitive to specify the camera position with a translation vector and the camera orientation with a rotation axis and angle. Hence, whenever we set up a camera, we always rotate it by a specified angle about a specified axis and then translate it to a specified point.

Denote the translation matrix for the camera position as

T_{c}

and the rotation matrix for the camera orientation as

R_{c}

. Then let

C = T_{c} R_{c}

be the matrix describing the overall transformation for the camera. Suppose that after we set up the camera, we want to transform the entire world space into a new space with the camera at the origin and in its default orientation. This new space, which we call the camera space, provides some advantages that we will see in the next section. We can intuitively see that in order to go from world space to camera space, we apply the inverse camera transform,

C^{- 1}

, to every point in world space.

Note that the terms eye position, eye orientation, and eye space are sometimes used instead of their respective camera counterparts. Their usage is just a terminology preference, and there are no differences in the meanings. For this class, we will use the camera terms, but we should also be aware that the eye terms exist.

When we look at a scene in real life with our eyes, objects further in the distance appear smaller than objects close by. The technical term for this effect is perspective. We want to reproduce this effect when we render scenes in computer graphics to make them look realistic. We can do so with the following process. We first surround our scene in camera space with a sideways frustum, which, when upright, is the bottom half of a square pyramid that has been cut in two along a plane perpendicular to the axis connecting the tip of the pyramid and the center of the square base. We map every point in the sideways frustum to a cube with what is called a perspective projection. A visual of the mapping is provided in Figure 3.

pict pict
Figure 3: The frustum in camera space is mapped to a cube by what is called a perspective projection. The smaller base of the frustrum (outlined in red) is mapped directly to the front face of the cube (also outlined in red). The rest of the frustum has to be shrunken down to form the rest of the cube.

The key idea here is that all of the frustum beyond the smaller base (outlined in red in Figure 3) must shrink in order to form the cube. For the frustum to shrink, points in the frustum must be clustered closer together. The further away a set of points is from the smaller base of the frustum, the closer together the points are clustered. This clustering causes the objects that those points form to shrink and get rendered smaller than their actual size, creating the desired perspective effect.

The cube resides in its own coordinate system known as normalized device coordinates or NDC for short. Note that this coordinate system has an inverted z-axis. This is due to convention.

We specify a perspective projection with parameters for the dimensions of the frustum.

l, r, t, b

stand for left, right, top, bottom and provide the dimensions for the smaller base of the frustum, which we will now call the near plane or projection plane.

n

stands for near and specifies in camera space the magnitude of the negative z-coordinate of where the frustum begins.

f

stands for far and specifies in camera space the magnitude of the negative z-coordinate of where the frustum ends.

We will now do a quick derivation of the perspective projection matrix, which transforms points from camera space into NDC given

l, r, t, b, n,

and

f

. We first look at just the

x

and

y

coordinate mappings. Let

(x_{c}, y_{c}, z_{c}, 1)

be the position of a point in our camera space and

(x_{n d c}, y_{n d c}, z_{n d c}, w_{n d c})

be the same position in homogeneous NDC.

Recall that the projection plane of the frustum is mapped directly to the front of the cube. Since the

x_{c} \to x_{n d c}

and

y_{c} \to y_{n d c}

mappings do not need to take depth (i.e. the z-coordinate) into account, we can consider them as first mapping pairs of

(x_{c}, y_{c})

onto the projection plane, and then mapping the projection plane onto a cube face. Let

x_{p}

and

y_{p}

be the corresponding

x

and

y

coordinates on the projection plane. Figure 4 shows a visual of

x_{c} \to x_{p}

. A similar idea is used for

y_{c} \to y_{p}

pict
Figure 4: The perspective projection mapping for the $x$ and $y$ coordinates from camera space to NDC can be thought of as first mapping the coordinates onto the projection plane and then mapping the projection plane onto a cube face.

The values of

x_{p}

and

y_{p}

are determined using ratios of similar triangles:

\begin{matrix} \frac{x_{p}}{x_{c}} = \frac{- n}{z_{c}} & , & \frac{y_{p}}{y_{c}} = \frac{- n}{z_{c}} \end{matrix}

This gives us

x_{c} \to x_{p} = - \frac{n}{z_{c}} x_{c}

and

y_{c} \to y_{p} = - \frac{n}{z_{c}} y_{c}

. We then get the direct mappings for

x_{p} \to x_{n d c}

and

y_{p} \to y_{n d c}

by solving the linear equations for

[l, r] \to [- 1, 1]

and

[b, t] \to [- 1, 1]

\begin{matrix} x_{n d c} = \frac{1 - (- 1)}{r - l} x_{p} + β_{x} & , & x_{p} = r \to x_{n d c} = 1 \end{matrix}

\begin{matrix} y_{n d c} = \frac{1 - (- 1)}{t - b} y_{p} + β_{y} & , & y_{p} = t \to y_{n d c} = 1 \end{matrix}

where

β_{x}

and

β_{y}

are constants that we need to solve for. Solving the above equations and substituting in

x_{p} = - \frac{n}{z_{c}} x_{c}

and

y_{p} = - \frac{n}{z_{c}} y_{c}

yields:

\begin{matrix} x_{c} \to x_{n d c} = \frac{(\frac{2 n}{r - l} x_{c} + \frac{r + l}{r - l} z_{c})}{- z_{c}} & , & y_{c} \to y_{n d c} = \frac{(\frac{2 n}{t - b} y_{c} + \frac{t + b}{t - b} z_{c})}{- z_{c}} \end{matrix}

Recall that we are working in homogeneous coordinates, hence our perspective projection matrix is going to need to accommodate for the

w

component. What we can do is take advantage of the fact that both

x_{n d c}

and

y_{n d c}

have a division by

- z_{c}

and make

w_{n d c} = - z_{c}

while making

x_{n d c}

and

y_{n d c}

equal to just their respective numerator portions from the above equations. We then have:

[\begin{matrix} x_{n d c} \\ y_{n d c} \\ z_{n d c} \\ w_{n d c} \end{matrix}] = [\begin{matrix} \frac{2 n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & \frac{2 n}{t - b} & \frac{t + b}{t - b} & 0 \\ . & . & . & . \\ 0 & 0 & - 1 & 0 \end{matrix}] [\begin{matrix} x_{c} \\ y_{c} \\ z_{c} \\ 1 \end{matrix}]

For the third row, we use the fact that

z

does not depend on

x

y

to write:

[\begin{matrix} x_{n d c} \\ y_{n d c} \\ z_{n d c} \\ w_{n d c} \end{matrix}] = [\begin{matrix} \frac{2 n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & \frac{2 n}{t - b} & \frac{t + b}{t - b} & 0 \\ 0 & 0 & A & B \\ 0 & 0 & - 1 & 0 \end{matrix}] [\begin{matrix} x_{c} \\ y_{c} \\ z_{c} \\ 1 \end{matrix}]

Using the mappings

- n \to - 1

and

- f \to 1

and doing some algebra gives us

A

and

B

and:

P = [\begin{matrix} \frac{2 n}{r - l} & 0 & \frac{r + l}{r - l} & 0 \\ 0 & \frac{2 n}{t - b} & \frac{t + b}{t - b} & 0 \\ 0 & 0 & \frac{- (f + n)}{f - n} & \frac{- 2 f n}{f - n} \\ 0 & 0 & - 1 & 0 \end{matrix}]

as the perspective projection matrix. This is the transformation matrix we use to transform a point in camera space to the homogeneous NDC. To go from homogeneous NDC to Cartesian NDC, we would divide our

x_{n d c}, y_{n d c},

and

z_{n d c}

terms by

w_{n d c} = - z_{c}

Note that sometimes the frustum we specify does not capture our entire scene, in which case, the points that were outside our frustum in camera space are mapped to the outside of the cube in NDC. These points are outside our field of view and should not be considered for rendering. Hence, when we later render what we see, we need to check for whether a given point is outside the cube. If it is, then we do not render the point.

We render our 3D scene by mapping all of its points in NDC onto a 2D grid of square pixels that we then display onto the screen. This process is known as rasterization. For each point in NDC, we map its

x

and

y

coordinates to a row (r) and column (c) ordered pair with

x \to c

and

y \to r

. We then fill the pixel at

(r, c)

to show the point in the display. The

z

coordinate of our point is used for depth buffering and backface culling, which we will cover on the next assignment. Sometimes, the coordinates for the grid are referred to as screen coordinates.

In this section, we will talk specifically about line rasterization. In order to rasterize a line with a grid of pixels, we must decide which squares around the line form the best approximation of the line when filled. Figure 5 shows a visual of this in action:

pict
Figure 5: Rasterizing a line by approximating it using a grid of pixels.

The fastest and simplest algorithm that does this is Bresenham’s line algorithm.

The basic Bresenham’s algorithm only works in one octant of the 2D coordinate plane, but it can be generalized to account for the other octants. The generalization will be left as an exercise in the assignment. For now, we will restrict ourselves to discussing the algorithm in just the first octant - i.e. line slopes are in the range

0 \leq m \leq 1

. Since it is more natural to talk about lines in terms of

x

and

y

, we are going to use

x

to refer to the column values and

y

to refer to the row values of our grid.

So we wish to connect two points

(x_{0}, y_{0})

and

(x_{1}, y_{1})

in the first octant of our grid with an approximation of a line. Let us decide to iterate over the x-coordinate. Then during our routine, if we had just filled in the pixel for some integral point

(x, y)

for

x_{0} \leq x \leq x_{1}

and

y_{0} \leq y \leq y_{1}

, then the next pixel we fill in will have to be either at

(x + 1, y)

(x + 1, y + 1)

. We must decide which one of these points results in a better approximation of the line.

Let

ε

denote the error between

y

of the integral point

(x, y)

and the true

y

in the actual line. The true

y

is then equal to

y + ε

. When moving from

x

x + 1

, we increase the true

y

by the slope

m

. Consider the case where

y + ε + m

is less than the midpoint of

y

and

y + 1

- i.e.

y + ε + m < y + 0.5

. If that is the case, then the next point in our line is closer to the pixel at

(x + 1, y)

than the pixel at

(x + 1, y + 1)

. Hence, the next pixel that we would want to fill is at

(x + 1, y)

. We would then update

ε

ε \leftarrow (y + ε + m) - y

. However, if

y + ε + m \geq y + 0.5

, then the next pixel should be at

(x + 1, y + 1)

and we would update

ε

ε \leftarrow (y + ε + m) - (y + 1)

. This leads us to the following algorithm:

Algorithm 1:

1: function Naive_First_Octant_Bresenham(

x_{0}, y_{0}, x_{1}, y_{1},

grid)
2:

ε \leftarrow

0
3:

y \leftarrow y_{0}

m \leftarrow

Compute_Slope(

x_{0}, y_{0}, x_{1}, y_{1}

)
5: for

x \leftarrow x_{0} to x_{1}

do
6: Fill(

x, y,

grid)
7: if

ε + m < 0.5

then
8:

ε \leftarrow ε + m

9: else
10:

ε \leftarrow ε + m - 1

11:

y \leftarrow y + 1

12: end if
13: end for
14: end function

We can optimize this algorithm by eliminating any dependence on floating point values. Consider the following manipulations on the inequality:

And we can express

ε \leftarrow ε + m

and

ε \leftarrow ε + m - 1

ε^{'} \leftarrow ε^{'} + d y

and

ε^{'} \leftarrow ε^{'} + d y - d x

respectively. This leads us to an algorithm with operations that involve only integers:

Algorithm 2:

1: function First_Octant_Bresenham(

x_{0}, y_{0}, x_{1}, y_{1},

grid)
2:

ε^{'} \leftarrow

0
3:

y \leftarrow y_{0}

d x \leftarrow x_{1} - x_{0}

d y \leftarrow y_{1} - y_{0}

6: for

x \leftarrow x_{0} to x_{1}

do
7: Fill(

x, y,

grid)
8: if

2 (ε^{'} + d y) < d x

then
9:

ε^{'} \leftarrow ε^{'} + d y

10: else
11:

ε^{'} \leftarrow ε^{'} + d y - d x

12:

y \leftarrow y + 1

13: end if
14: end for
15: end function

Note how the only operations in this algorithm are simple (and fast) arithmetic operations between integers. In addition, the multiplication by 2 can be implemented using a bit-shift. Hence, we end up with a fast and efficient line drawing algorithm for the first octant. The generalization to all octants will also maintain the simplicity and efficiency. As we mentioned before, the generalization will be left as an exercise in the assignment.

Written by Kevin (Kevli) Li (Class of 2016).
Links: Home Assignments Contacts Policies Resources