Moving 3D objects with a 2D mouse

Here's the problem: you're making a 3D modeler, level editor, or other design program, and you need to move objects around with the mouse. You don't want to have the object move faster or slower than the mouse, because (although simple) that would look cheesy.

But how? There's not very much good information on doing this anywhere on the web - at least, not that I've been able to find. I'll share how I did that, and with good results. Whatever direction the camera faces. Four viewports? No problem. Perspective view? Ditto.

I won't be going into much background information, so you should have at least a basic understanding of 3D graphics programming, including vectors, matrices, coordinates, and 3D spaces.

I'll be using plain old DirectX 10, but the features I use are nearly similar in DX 9, and you can do the same things with DX 11, except you'll want to use the more up-to-date XMath libraries. Look it up for more information. OpenGL also has the same features, although it does a few things differently. I'll try to mention these when they come up. OpenGL documentation is available as well. XNA could be used as well, as it has most features DX has and works the same way for the most part. Once again, check the documentation.

Back to our problem, there are several ways to do this; the most common are:

  • Make a plane parallel to the camera, at the same distance away as the object. When the mouse moves, project a ray from the camera, through the plane. The point where it hits the plane is where you put the object.

  • Calculate the depth of the object, transform the mouse coordinates into projection space, and calculate the offset of the object and mouse (both in projection space). Whenever the mouse moves transform it to projection space, add the offset, and transform to world coordinates to get the new object position.

The second option is the one I'll use here.

First, we need world-to-projection and projection-to-world functions. These will take in a Vector3 and return a Vector3, and are pretty simple: for world-to-projection, just transform the input by the camera matrix, and then the projection matrix.

D3DXVECTOR3 *WorldToProj(D3DXVECTOR3 *out, D3DXVECTOR3 *in) {
    // Multiply vectors by camera and projection matrices
    D3DXVECTOR4 working(in->x, in->y, in->z, 1.0f);
    D3DXVec4Transform(&working, &working, viewMatrix);
    D3DXVec4Transform(&working, &working, projMatrix);

    // Save as a 3 dimensional vector
    out->x = working.x / working.w;
    out->y = working.y / working.w;
    out->z = working.z / working.w;

    return out;

The variables viewMatrix and projMatrix will need to be set to (unsurprisingly) the view and projection matrices. Now for the projection-to-world function we multiply the input by the inverse matrices.

D3DXVECTOR3 *ProjToWorld(D3DXVECTOR3 *out, D3DXVECTOR3 *in) {
    // Multiply vectors by inverse of cam/proj matrix
    D3DXVECTOR4 working;
    D3DXMATRIX camProj, invCamProj;
    D3DXMatrixMultiply(&camProj, viewMatrix, projMatrix);
    D3DXMatrixInverse(&invCamProj, NULL, &camProj);
    D3DXVec3Transform(&working, in, &invCamProj);

    // Save as a 3 dimensional vector
    out->x = working.x / working.w;
    out->y = working.y / working.w;
    out->z = working.z / working.w;

Note: If you're using OpenGL you need to switch the matrix multiplication - put projMatrix first and then viewMatrix. Check the documentation.

These will be the main functions we use, since we will mostly be working in projection space. But we will need one other function to make our lives easier:

void GetMousePosInProjSpace(float *outX, float *outY) {
    // Move the mouse into projection space
    float x = (float)mousePos.x;
    float y = (float)mousePos.y;
    float wdiv2 = (float)viewportSize.x * 0.5f; // Half of viewport width
    float hdiv2 = (float)viewportSize.y * 0.5f; // Half of viewport height

    *outX = (x - wdiv2) / wdiv2;
    *outY = -(y - hdiv2) / hdiv2;

This function will take a mouse position in screen space (0 - screen width, 0 - screen height) and convert it to projection coordinates (-1.0 - 1.0, -1.0 - 1.0). We need to invert the y coordinate because in the graphics world, y faces up.

The function will also need two member variables (or globals, depending on your project): POINT mousePos, and POINT viewportSize. POINT just contains two integer values x and y.

Now to actually perform the translation, we'll need four events: begin translation, mouse move, update, and end translation. When and how you will invoke these events is up to you. Also, you could do away with the update and just update the position in the mouse move event if you want. But graphics engines generally have a place to update objects, so if there is one, you might as well update the object there.

First, to begin translation: this could be done by holding a mouse button down, pushing a certain hotkey, or by clicking on a handle (like most 3D modelers). If you choose the handles route, making them is your problem. I'll just go through the actual movements. Also, to make things simple, I won't show you how to constrain the movement to a certain axis, although it shouldn't be very hard to implement yourself.

You'll need a few variables:

bool translating;             // True if we're currently in translation mode
D3DXMATRIX objTransformStart; // Original object position
D3DXMATRIX objTransform;      // Current object position
float objectDepth;            // Depth of the object (z-coord in proj space)
D3DXVECTOR3 offset;           // Offset of the object from the mouse

void BeginTranslate() {
    translating = true;

    // We need to project the object to get its proj space coords
    D3DXVECTOR3 out;
    WorldToProj(&out, &objTransformStart);
    objectDepth = out.z;

    float mx, my;
    GetMousePosInProjSpace(&mx, &my);

    // Subtract to get the offset
    offset.x = out.x - mx;
    offset.y = out.y - my;
    offset.z = 0.0f;

First we get the projection space coordinates of the object and save it as objTransformStart. The only reason we need this is in case the user cancels the translation operation in the middle - such as by pressing Esc. The objectDepth variable we'll save as the z value, which will always be the distance from the camera to the object. We then get the offset of the mouse to the object, in projection space.

The reason we need the offset is, because there isn't just one 3D point that corresponds to a 2D point. There are, in fact, infinity points in 3D space that correspond to a 2D point. So if we save the depth, when we move the object we can set it to x, y, depth - where x, y are the coordinates of a point on the ray that goes through the mouse from the camera.

The MouseMoved function we'll make next:

void MouseMoved(int x, int y) {
    if (translating) {
        // Calculate mouse position (with depth of object position)
        D3DXVECTOR3 mouse;
        GetMousePosInProjSpace(&mouse.x, &mouse.y);
        mouse.z = objectDepth;

        // Calculate object pos using mouse + offset and convert to camera coords
        D3DXVec3Add(&objTransform, &mouse, &offset);
        engine->ProjToWorld(&objTransform, &objTransform);

This should be pretty self-explanatory. First we convert the mouse position from screen to projection coordinates so it is actually in our 3D world. Keeping in mind what I said earlier, we'll set the mouse's z value to the depth of the object at the beginning of tranformation.

We'll add the offset to the current mouse position, convert to world space, and save it as the object's position.

The update event you can make yourself, just set the object's position to objTransform. No sweat.

The end event is simple too: if the user cancelled, set the object's position to objTransformStart. Otherwise, set it to objTransform. Set translating to false and that's it!

One thing I will note, DirectX and OpenGL both have project and unproject functions. You can use them if you want, just check out the documentation for them. They don't work exactly the same as my WorldToProj and ProjToWorld because they go all the way to/from screen coordinates. My functions aren't super complicated, but if you find a way to use the (un)project functions instead then go right ahead. Or if you want to do the ray casting thing I mentioned at the beginning you can use them for that too. They don't work exactly how they may seem, though.

That said, go ahead and use this code for whatever you want, play with it, add features, make it hide the mouse and keep it in the same place so it never hits the edge of the screen, or maybe try to rotate/scale as well using the same principles. It's actually incredibly simple once you figure it out.