LinXi +

bundler输出文件

bundler输出文件

Bundler produces files typically called 'bundle*.out' (we'll call these "bundle files"). With the default commands, Bundler outputs a bundle file called 'bundle.out' containing the current state of the scene after each set of images has been registered (n = the number of currently registered cameras). After all possible images have been registered, Bundler outputs a final file named 'bundle.out'. In addition, a "ply" file containing the reconstructed cameras and points is written after each round. These ply files can be viewed with the "scanalyze" mesh viewer, available at http://graphics.stanford.edu/software/scanalyze/. There are several other viewers that also can read ply files (as scanalyze can sometimes be difficult to compile under Linux). These include Meshlab and Blender (where you can use File->Import->PLY to open a ply file---thanks to Ricardo Fabbri for the tip).

The bundle files contain the estimated scene and camera geometry have the following format:

# Bundle file v0.3
<num_cameras> <num_points>   [two integers]
<camera1>
<camera2>
   ...
<cameraN>
<point1>
<point2>
   ...
<pointM>

Each camera entry contains the estimated camera intrinsics and extrinsics, and has the form:

<f> <k1> <k2>   [the focal length, followed by two radial distortion coeffs]
<R>             [a 3x3 matrix representing the camera rotation]
<t>             [a 3-vector describing the camera translation]

The cameras are specified in the order they appear in the list of images.

Each point entry has the form:

<position>      [a 3-vector describing the 3D position of the point]
<color>         [a 3-vector describing the RGB color of the point]
<view list>     [a list of views the point is visible in]

The view list begins with the length of the list (i.e., the number of cameras the point is visible in). The list is then given as a list of quadruplets , where is a camera index, the index of the SIFT keypoint where the point was detected in that camera, and and are the detected positions of that keypoint. Both indices are 0-based (e.g., if camera 0 appears in the list, this corresponds to the first camera in the scene file and the first image in "list.txt"). The pixel positions are floating point numbers in a coordinate system where the origin is the center of the image, the x-axis increases to the right, and the y-axis increases towards the top of the image. Thus, (-w/2, -h/2) is the lower-left corner of the image, and (w/2, h/2) is the top-right corner (where w and h are the width and height of the image).

We use a pinhole camera model; the parameters we estimate for each camera are a focal length (f), two radial distortion parameters (k1 and k2), a rotation (R), and translation (t), as described in the file specification above. The formula for projecting a 3D point X into a camera (R, t, f) is:

P = R * X + t       (conversion from world to camera coordinates)
p = -P / P.z        (perspective division)
p' = f * r(p) * p   (conversion to pixel coordinates)

where P.z is the third (z) coordinate of P. In the last equation, r(p) is a function that computes a scaling factor to undo the radial distortion:

 r(p) = 1.0 + k1 * ||p||^2 + k2 * ||p||^4.

This gives a projection in pixels, where the origin of the image is the center of the image, the positive x-axis points right, and the positive y-axis points up (in addition, in the camera coordinate system, the positive z-axis points backwards, so the camera is looking down the negative z-axis, as in OpenGL).

Finally, the equations above imply that the camera viewing direction is:

R' * [0 0 -1]'  (i.e., the third row of -R or third column of -R')

(where ' indicates the transpose of a matrix or vector).

and the 3D position of a camera is

-R' * t .

Blog

Webgl

Project