Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Programmer’s guide to homogeneous coordinates (hackernoon.com)
102 points by signa11 on May 14, 2017 | hide | past | favorite | 13 comments


The easiest way to understand homogeneous coordinates in general, IMO, is to understand the 2D case.

In that case, your homogeneous coordinates have 3 components, and it's pretty easy to visualize them in 3D space. The z=1 plane is your actual 2D plane. Now consider the set of all lines passing through the 3D origin, those are either parallel to z=1 plane or intersect it at precisely one point. That point of intersection is (x0, y0, 1), and if you multiply by any constant factor, the point just moves along its corresponding line.

Thus, any 3d point with z!=0 can be mapped 1:1 to your 2D plane by simply dividing by z. Points where z = 0 don't map to z=1 plane, they're considered "infinitely far away", and can be used to specify directions on your 2D plane (since they don't get affected by translation!).

Once you wrap your head around the 2D case, it's fairly easy to extend it to 3D.

There isn't anything inherently magical about homogeneous coordinates, it's just a convenient notation for referring to points in space that just happens to lend itself particularly well to affine transforms in that space.

For anyone interested, I highly recommend reading this article: http://deltaorange.com/2012/03/08/the-truth-behind-homogenou...


I understand what you're saying, but they're still very magical to me. They are taking something that is a non-linear operation in R3 (like a translation) and making it linear by using a trick and giving it an extra dimension.

This is somehow very unsettling to me. (what is the set of operations that can be linearized like this?) I wish I "got it" better


The trick is that a translation of a point (a1, ..., an) is essentially a shear of a point (a1, ..., an, 1) - a linear operation in R^(n+1). But because the points that are of interest lie on a (hyper)plane that is not a vector space - it does not intersect the origin - a shear looks like a nonlinear operation after projection to R^n.

    ^
    |---a
    |   |
  --+--------->
    |

    ^
    | ,---a
    |/   /
  --+--------->
    |


And you can see how points on the a_(n+1) = 0 plane ("infinitely far away" in projective terms) are invariant under this operation - so they behave like pure direction vectors and can be used as normals and transformed with the same matrix as points!


This explanation really helped, thank you.


The translation part is not that complex, you're basically just introducing an extra coordinate 'w' which is always 1, so you can write x -> x + w, which is linear, rather than x -> x + 1, which is not.

Things start to get weird when you note that this works even when w isn't 1. If you then decide that e.g. (x,y,w) and (x/w, y/w, 1) should be the same point, you can do some rather interesting things. Although, if you identify (x:y:w) with all the points on the line (lx:ly:lw) (for any l) then suddenly (x/w, y/w, 1) is just the point where this intersects the plane w=1, which isn't too hard to visualize. And if you think about it this also kind of explains why this coordinate system can be used to project a 3D scene onto a 2D surface with correct perspective.


> All the surfaces described by an equation of degree n are the same.

Hang on a minute, let's not get ahead of ourselves. First of all the degenerate surfaces (e.g. (x - w)^n = 0) are clearly different from the other surfaces. Second of all, while you can use simple linear transformations to transform most quadrics into one another, there are two cases where this won't work if you use real numbers. Up to linear isomorphism these are:

X^2 + Y^2 + Z^2 - W^2 = 0

X^2 + Y^2 - Z^2 - W^2 = 0

(to get all degenerate or 'trivial' cases you also need to include equations where some of the coefficients are 0, or all coefficients are the same, see Sylvester's law of inertia for more information).

Admittedly in 2D there is only one interesting case, X^2 + Y^2 - W^2 = 0 (note that this is a cone, hence why all 2D quadrics can be generated by slicing a cone).

And that's just the 2nd order, in higher orders things get complicated quick, although I suppose that in the complex projective plane all surfaces (or rather curves) of the same degree actually are equivalent, albeit only topologically.


What I find most amazing about homogeneous coordinates is the duality between points and lines and their behavior under vector cross product.

Basically:

cross(p1, p2) = line_between_p1_and_p2

cross(line1, line2) = point_of_intersection_of_lines_1_and_2

To understand this you would need to dive a bit into projective space and how lines are represented there, but it is very enlightening.


And in 3 dimensions it's points and planes that are dual - and finding the line between two points or the common line of two planes can now be done in a similar way, but using Plücker coordinates for the lines instead.


Those two operations are called "meet" and "join", and in principle they work with any linear subspaces (point, line, plane, etc.) So the meet of a line and a plane in 3D is a point, the join of a point and a line in 3D is a plane, etc.

Wikipedia is good on this stuff if you are patient with mathematician's way of expressing things...

https://en.m.wikipedia.org/wiki/Join_and_meet

https://en.m.wikipedia.org/wiki/Linear_subspace

"Lattice of subspaces

The operations intersection and sum make the set of all subspaces a bounded modular lattice, where the {0} subspace, the least element, is an identity element of the sum operation, and the identical subspace V, the greatest element, is an identity element of the intersection operation."


Then you can go full geometric algebra and base your whole geometry on this latticez


The .w component clicked for me when I understood that it allows to transform 3D-vectors (describing a direction and magnitude, .w is 0) and 3D-points (describing a position in space, .w is 1) by the same 4x4 matrix. The translation part of the matrix is multiplied by the w component. Thus a vector (with .w=0) will not be translated, only rotated and scaled, while a point (with .w=1) will be scaled, rotated and translated.

All the esoteric projection- and 4D-stuff is really not that important compared to the simple feature of allowing to feed positions and directions into the same matrix operations.


That article totes copied the figures from the net without shouting out the original creators!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: