3-D Primer


Despite the popular metaphor, there is no magical space inside a computer where objects exist.  Obviously, people realize that, even if a computer is showing a picture of a moving dinosaur, you can't crack open the computer and let the dinosaur out: even realizing that, however, many people fall prey to the fallacy that computer generated characters have some sort of objective reality, and there's just a special software that makes computer creatures, identical in every way to their real counterparts.  This reality, people think, may not be physical, but it is nonetheless an objective unarguable fact.  Words like "virtual" get bandied about.

This perception is reinforced by the fact that a small group of people work insanely hard to produce computer generated images that in fact do act like real objects.  But, as you'll see throughout this primer, such realism is achieved only through the application of talent, effort and (in the end) low-down dirty tricks.  All a computer knows is numbers.  Specialized programs go a long way towards turning those numbers into beautiful images, but it takes talent to go the final mile.

In order  to simulate the existence of a 3-dimensional space, the computer stores numeric information about points in that hypothetical space.  In general, a point is represented by three numbers: its distance in each of the three dimensions from some arbitrarily chosen center point. Each of these points is coded as being connected to other nearby points.  This in turn defines a surface in 3-dimensional space.  If this surface is closed, like a candy shell, it defines the mathematical outline of an object.

Once the computer has an idea of the object, it simulates samples through each of the pixels of a given image.  Each of these pixels is filled with a color based on the projection of the mathematical definition towards a given point in 3-dimensional space (colloquially, "the camera").  Obviously, if a human being were to attempt to make a picture in this way, it would be nearly a doomed cause.  We do mathematics slowly, whereas we do a very good job of roughly estimating what something should look like.  Computers, on the other hand, are completely incapable of rough estimation, but they are incredibly good at mathematics.  It is because of this single-minded facility with mathematics that a computer can easily draw thousands upon thousands of polygons in the space of an eye blink.

It is important to note that the computer only has information about the surface of an object.  The vast majority of computer animation, no matter how real it appears, is only working with the infinitely thin surface of a simulated object.  While there are computer systems that consider the entire volumes contained inside a set of polygons, they are so computationally expensive that their use is mainly restricted to specialized high end applications.

While polygons always meet with sharply defined edges, characters created with 3-d software generally have curved surfaces.  This effect can be achieved by several different methods.

First, computer programs can take advantage of the fact that the human eye only perceives a certain amount of detail, with the human mind filling in the rest based on its best guess.  Because of this, a model that is made out of a sufficiently large number of small polygons will appear to be smooth, simply because the eye cannot resolve enough information to sense the tiny edges.  However, this method requires exponentially more polygons, which in turn requires more effort on the part of the CPU when it renders.  Also, when it comes time to animate the model, the large amount of polygons can make it bulky and difficult to work with.

Second, a program can alter the way in which it calculates the shading on a point in a polygon.  Remember that the computer is simply deriving this from a formula involving the lights, camera and polygon.  Instead of calculating the color based on the actual facing of the polygon at each point, the program can act as if the facing is actually curving as it approaches the edges of the polygon.  This facing is not mathematically consistent with the actual information of the polygon, but in most instances this will not be noticeable.  The result of the curving facing is that edges blend together without visible seams.  This method of smoothing is called "Surface Interpolation".

The only problem with this method of adding curvature is that the actual underlying geometry has not been effected.  This is only noticeable around the very edges of a figure, but it can provide a disconcertingly jagged effect.

For the most part, programs use a combination of the two methods above:  they create enough polygons in a smooth mesh to guarantee that the silhouette of the figure is not visibly jagged, and then they create a fictional curved facing on those polygons, to further smooth the appearance of the surface.  This is the technology in use in most of today's blockbuster animation movies.

A third option is becoming increasingly powerful.  Some programs are now developing facilities to work with surfaces more mathematically flexible than polygons.  By defining information about how a line should curve between its two endpoints, the program is able to make surfaces that are naturally without facets.  Also, because the curvature is mathematically defined, it effects both the silhouettes and the surface appearance of the model, without requiring an unwieldy number of points.  The problem with this method is that it requires even more computational power to render a sparse mesh of these curving surfaces than it does to render an extremely dense mesh of polygons.  However, with computer power increasing exponentially, this method is gaining increasing prominence.  This method of storage and rendering is described by many names, according to the specific mathematics involved, but generally all of the names have "Spline" somewhere in them ("Spline" being the general mathematical term for a curve between two end-points with curvature defined by additional numeric controls).

A fourth method, which is extremely powerful in some specialized cases, is to represent geometry in a different way: specifically, points are recorded as before, but now each point has a geometric shape associated with it. In its simplest form, this shape is a sphere of a specified radius, but there are also systems that allow more complicated shapes.  When the computer searches for the surface of the object, it takes into account the influence of the shapes defined by all nearby points, and blends their shapes together before calculating the output color.  This often results in a slightly "gooey" looking output, with a feeling of surface tension between the various shapes. However, this output is often ideal for items such as smoke clouds, explosions, and of course liquids.  This method also has various names, but each name generally derives from the original name of "Metaballs" (creating names like "Metablocks" and so forth).

Characters made in a computer program often have very dense surface detail.  Dinosaurs have scales, aliens have strange flickering skin, and so on.  Fortunately, all of this detail does not need to be created polygon by polygon.  Instead, almost all computer 3-d programs support various implementations of a technology known as "Texture mapping".

The fundamental trick of texture mapping is that the computer takes a flat image and creates a set of numeric codes that tell it how to map that image onto a set of polygons.  In the case of a color texture map (the most simple type, and the example that will be used to explain the method), when the program needs to know the color of a particular point on a polygon, it uses the codes to discover where each of the polygon's corner points lies on the image, and then uses that information to deduce where the specified point would lie between those points in the image. It then looks up the color stored at the calculated point on the image, and uses it throughout the rest of the calculations (Texture maps can also be applied to Spline based models;  There is some difficulty, however, applying texture maps to meta-ball objects, because the varying scale of the "surface tension" areas between the meta-shapes makes it hard to pin down reference points on the flat image).

The effect of this method is that the polygons look like an image has been stretched over them, and the image adheres to the surface of the 3-d model no matter what angle it is viewed from.

If only colors could be mapped, however, the only benefit would be that a polygon could be painted with color.  Color mapping alone could not produce fine details like scales, because the flat nature of the polygon would be evident as the viewpoint was turned this way and that.

However, there is another kind of texture mapping called "Bump Mapping".  In this case, the flat image being mapped is made of various shades of gray, representing the elevation of the image, with black being the lowest point, white being the highest point, and shades in between representing intermediate elevations.  The image is attached to the polygons in the same way as with a color map, but the program uses it differently in calculations.

When the program is calculating the facing of a particular point on a polygon, it has the option to bend this facing off of the true facing of the polygon as a whole. This was discussed earlier in regards to smoothing polygon surfaces by Surface Interpolation.  The same technique is used in Bump Mapping.  The computer calculates the point on the bump image that corresponds to the point on the polygon that is being rendered, or the Reference Point.  The curvature of the bump image itself can be calculated by looking at the values surrounding the Reference Point on the bump image.  This facing is then added to the facing that would otherwise have been applied to this point on the polygon, and the rendering continues.

The result of this method is surprisingly effective.  The human eye tracks minute variations in the elevation of a surface by observing shadows and highlights on the surface.  These shadows and highlights can also be created by varying the facing of individual points of the polygon, even without any change in the actual location of the point in 3-dimensional space.  Because the human perceptual system is cued to interpret these shadows and highlights as relating to surface variation, the effect of a bump map is to allow strikingly realistic surface variation to be mapped onto fundamentally flat polygons.

While color and elevation are certainly the first things that human beings notice about a surface, there are many other features which define the appearance of an object.  The amount of glossiness in its response to light, the amount of reflectivity, and the degree to which it scatters light in different directions, all contribute to the appearance of an object.  Most powerful 3-dimensional programs will provide the capability to alter all of these settings both for an individual polygon and on a point-by-point basis by the application of Texture Maps.

The most compelling results are created when all of these maps are used together, with color and glossiness reinforcing the surface features picked out by a Bump Map. When used in combination, Texture Maps can transform a model from an unrealistically perfect mathematical abstraction into a creature so real that you feel you could reach out and touch it.

First, a basic digression into the nature of perceived movement is in order.  The human eye does not, as has been said earlier, perceive everything in perfect detail.  The mind does a large amount of guesswork at a pre-cognitive level before the part of your mind that you think of as "you" ever gets to look at the images your eyes are reporting.  This is why optical illusions persist, even when one understands the mechanism of the illusion.

The most basic and most important optical illusion is also the simplest:  a series of still images, each slightly different than the one before, projected in front of the eyes one after the other, will appear to move.  There is no movement of course: there is just a series of still images.  But the mind perceives movement.

This is the basis of movies (which stretch out distinct photographic slides on a long roll of film, and project them at 24 frames per second) and television (which receives signals to scan the surface of a phosphor screen, completely refreshing the screen 30 times every second).  It is also, not surprisingly, the fundamental "trick" to allowing a computer to create movement.

The basic trick is deceptively simple.  You render out an image (as discussed above) of your model.  Then you move the camera, or change the geometry of the model, or both.  Then you render out another image of the new situation.  In the end, you render out a huge number of these images, and then project them to the human eye, one after the other, very quickly.  Miraculously, movement is perceived.

Of course, the basic trick is not the whole story.  Remember that you are dealing with (conceivably) hundreds of thousands of polygons and points, and each of them needs to be positioned correctly in every frame of output.  And, as mentioned above, you're outputting thirty or twenty-four frames for every single second of output.  If an animator had to reposition every point by hand for every frame, they would need to make millions of individual settings for every second of output.  Obviously, this would only be feasible for the most obsessive individuals.

Fortunately, the computer's prodigal strength in mathematics comes to the aid of the animator in many ways.  The most basic thing that a computer can do is to allow the animator to specify only some of the frames (the "key frames") and then to interpolate the positions of the points in all of the other frames (the "in between frames").  The computer counts on the fact that the points will move regularly between the specified frames, and thus relieves the animator of the burden of calculating the proper increment of movement for every single frame.  Any animation program provides this functionality.

Beyond this functionality, almost all animation programs provide the ability to move, rotate and scale a group of points around a specified central point.  This allows the animator to select a group of points (the arm, for example), specify a pivot point (the shoulder) and then rotate the entire group (moving the arm up and down).  This helps to reduce the number of operations that the animator needs to do in order to move a large number of points.

For many 3-d applications, the above functionality is all that is needed.  If, for example, you have a spaceship and you want to fly it over a planet, all you need to do is to group the spaceship, then move and rotate it at each key frame as you see fit.  Because the ship does not move relative to itself (i.e. it doesn't bend, or flex), there is no need for advanced techniques.  However, in a situation where you have a living character, that needs to move limbs, express emotions, and generally have complicated motion, an animator can generally benefit from more powerful tools.

More advanced animation programs provide the ability to permanently group parts of a character and associate them with a given pivot.  Generally such packages also allow these groups to be organized hierarchically: for example, a torso group could have the arms as its children.  When the torso moved, the arms would move with it, but the arms could also be rotated separately around their own pivot.  This is generally referred to as "segmenting" a model.  The calculations involved in moving the arms when the torso moves are referred to as "Forward Kinematics".

A subtle variation on this method is to create structures called "bones" inside of the character, to simulate the working of the skeleton.  Points on the surface of the model are assigned to these bones, and thereafter move as the bones move.  Some implementations allow points to be assigned to more than one bone, averaging the movement that they would have received from each.  Other implementations restrict points to being assigned to a single bone, but allow animators more freedom in terms of controlling and interpolating bones.  The bones are organized with connections in much the same way that a typical skeleton is organized, and so a complex network of movement can be created simply by dragging bones back and forth.  This is referred to as "skeletal animation".

Skeletal animation lends itself to another technique of great use to animators, called "Inverse Kinematics".  Whereas in Forward Kinematics you would move the bicep and watch the hand move as a result, in Inverse Kinematics you can move the hand, and the computer will calculate how the bicep would need to move in order to accommodate your requested hand movement.  Because there are often several different ways in which the bicep could move, the computer must follow some method of choosing among them.  Some implementations of Inverse Kinematics (or IK) have more intelligent choice methods, some have less intelligent ones:  it's simply a matter of what software is in use.

Another capability that advanced Skeletal implementations will often provide is the ability to mathematically connect the movements of disparate bones.  For example, a character could be configured in such a way that their eyes always point at the end of their finger.  Some programs offer this functionality by allowing the user to write mathematical Expressions which are evaluated by the computer. Other programs give a toolbox of possible Constraints, each applying a certain type of restriction to a bone, and all of them interconnectable.  In general, the things that can be done are the same in each system, but some things are easier to do in Expressions, while others are simpler in Constraints.

One type of Constraint that is available in almost all Skeletal implementation is a constraint on the range of motion of a bone, relative to its parent bone. This type of functionality allows an animator to specify that (for example) an elbow cannot bend backwards.  This is particularly useful for aiding an Inverse Kinematics engine in coming to an appropriate solution.

Skeletal animation, though very useful, tends to produce animation which seems a little rigid and linear.  While you can add more bones, Expressions and Constraints to create somewhat smoother movement, most animators eventually find that they need to get access directly to the control points (for items like facial animation, muscle bulging, cloth animation and so on).

Some programs provide yet another way of dealing with the points of the model.  They allow the user to specify an area in space, and then Deform that area, which results in all the points inside of the area being deformed similarly.  This is particularly good for dealing with large amounts of fatty tissue, permitting the animator to create very natural bouncing movement.

Another method, often called Morph Targets, allows the animator to specify a set of "Poses": these are sets of data which specify offsets for individual points in the model.  The power of a Morph Target system is that it allows the animator to combine these Poses with each other seamlessly, and to combine partial percentages of individual Poses.  So, for example, an animator could create a pose for "Raise Right Eyebrow", a pose for "Open Right Eyelid" and a pose for "Smile".  Then, to create a curious expression, the animator could simply specify "Raise Right Eyebrow" at 80 percent, "Open Right Eyelid" at 100% and "Smile" at 20%.  These Poses would then be seamlessly combined by the software and result in a facial expression with the eyebrow cocked over an open right eye, and the hint of a smile on the lips.

An item to mention in this regard is the question of how many points have to be modified by hand to, for example, establish these Poses.  Theoretically there is no difference between moving a large number of control points and moving a small number of control points, so long as the defined surface is the same.  But in practice, moving more points is more of an effort, which means the animator has less time to experiment.  Also, when moving a large number of points the animator has more opportunities to inadvertently introduce errors into the animation.  For this reason, many character animators (who spend a large amount of time tweaking individual points) prefer to work in a system that has fewer points, but where each point controls a larger area of the surface.  Spline systems (see above) are generally well regarded in this arena.

The final general area of assistance to animators is that of Computer Simulation. This covers a broad range of topics, but it basically means that the computer does some of the animation for you, according to laws that simulate (some better, some worse) the laws of physics (or some other phenomenon).

Typically, Simulation code is very specialized.  For example, a program might provide a feature that prevents objects from passing through each other (called Collision Detection).  Or, a program might provide a feature that automatically animates a character in response to gravity (what goes up must come down, and all that).  Some advanced systems can handle much more complicated items, such as cloth flapping in a breeze:  others apply simple rules over and over to a multitude of small objects (particle systems and flocking systems).  Still other systems will attempt to reproduce a certain type of movement (for example there is a program that lets you plot out footsteps, and then attempts to move a bipedal character in such a way that it walks in those footsteps).  As of yet, there is no code that accurately simulates even a fraction of the phenomena of the physical world.

In general, Simulation engines are good labor-saving devices in a situation where the results do not have to be particularly aesthetically balanced.  But many animators avoid using Simulation engines wherever possible, precisely because of the loss of control that they imply, and because the results are often sketchy at best.

For all of the tricks that are available, the basic means of animating computer graphics is still largely "by hand".  Whether it is using Inverse Kinematics, or moving individual points of the model, most animation is achieved because the animators have learned and practiced the principles of motion, and apply that knowledge to the program.

This is possible precisely because video is, after all, just a series of electrical impulses. In fact, the information stored on a video tape is very much the same as the information that is output by an animation program:  it is a series of color samples which are arranged in a grid across the screen in order to present a coherent picture.

The way that computer graphics is integrated with live video is conceptually simple, although in practice it can be agonizing.  Through some method, the video is loaded into a computer as a series of images.  A computer animation is output as another set of images.  And then these two sets of images are composited together by some other software package. Finally, the resulting images are sent back out to video tape, which can then be shown on television.  A similar (but much more demanding and costly) process is used to transfer computer graphics to film.

The problems arise when you try to tell the computer about the real world.  In order to composite your computer dinosaur with live video, for example, you may need the dinosaur to walk behind a tree from the live video.  In order to do this, you need to tell the computer where the tree is: generally this is accomplished by creating a model of the tree in your 3-d program, and telling the program to use that model to obscure the picture of the dinosaur when it renders.  Of course, you need to get the model of the tree absolutely, positively, exactly correct, or else the composition of the two images will look bizarre (it will either look like part of the dinosaur vanishes as it passes near the tree, or part of the tree vanishes as the dinosaur nears).  And that's for a relatively simple, unmoving object like a tree:  the problems are only magnified when moving actors are introduced.

Still, the striking images that can be created by combining live action and computer graphics are well worth the effort.

There are basically three possible reasons why computer graphics sometimes sticks out like a sore thumb.

The first reason is poor composition:  sometimes, particularly when a piece is completed under a tight deadline, a computer scene is modeled or rendered without appropriate attention paid to synchronizing all possible elements with the live action scene.

For example, sometimes shadows on computer graphics will not point in the same direction as the shadows in the live video, because the lights were not placed in the proper positions.  Often computer graphics will look either too dark or too light, or just washed out, because the lights were not set to the proper values, or the surface features of the model are not appropriate to the environment it has been placed in.

Even more obvious, sometimes small bits of the original background of the computer graphics will be transplanted when the character is positioned on the new background. This creates a "haloing" effect which is, for some reason, incredibly obvious and distracting to the human perceptual system.

The second reason that a piece might stick out is plain old poor modeling.  Particularly with extreme low budget productions, characters and props are often created that look just plain bad.  Sometimes there are obvious flaws of geometry (infinitely thin planes showing edge-on, or visibly hollow objects).  Sometimes the surface features are very detailed in one sense (detailed color maps, or bump maps) but lack similar detail in other aspects (for example, a detailed color map over a flat polygon).  Whatever the case, some models just come across as impossible to believe in.

However, both of the above reasons are comparatively rare:  they show up in features that have not paid proper attention to the needs of computer animation, but generally do not appear in more advanced features.  The third reason that a piece might stick out is the most insidious and dangerous one.  Simply put, the object might be too perfect to exist in the real world.

In the real world, things get nicked.  They get banged.  Flat surfaces get dented, and detailed surfaces get blurred by dirt and grime.  Moreover, even straight off the showroom floor, as it were, most objects in the real world have built-in imperfections. Nothing is ever built precisely to specification:  millimeter variances in a surface may seem irrelevant when you're thinking about modeling something, but you can be absolutely sure that a person looking at the model will note the absence of the gently mottled lighting that they're accustomed to... the mottled lighting that is the result of those very millimeter variances!

For a long period of time, the art of special effects filming was a fight against imperfection:  when your model was filmed frame after frame, you couldn't afford to accidentally dent it.  If you did, the dent would appear from one frame to the next, and would look like something had slammed into the model.  Similarly, models had to be polished to remove dust, grime and the bodily oils that come off of the fingers of poor put upon stop-motion animators.  The fight against entropy must have seemed nearly unwinnable in those days.

The irony of this is that now, with mathematical perfection easily attainable in computer graphics, the search has turned towards ways to create imperfection. But that's pretty much the way it goes with a young art form like computer animation: the most unexpected things can happen.