Streaming 4D Visual Geometry Transformer

GitHub Repository | Project Page | Paper | Hugging Face Model

Big thanks to the VGGT team for sharing your awesome code! We built this demo based on it.

Upload a video or a set of images to create a 3D reconstruction of a scene or object. StreamVGGT takes these images and generates a 3D point cloud, along with estimated camera poses.

Getting Started:

  1. Upload Your Data: Use the "Upload Video" or "Upload Images" buttons on the left to provide your input. Videos will be automatically split into individual frames (one frame per second).
  2. Preview: Your uploaded images will appear in the gallery on the left.
  3. Reconstruct: Click the "Reconstruct" button to start the 3D reconstruction process.
  4. Visualize: The 3D reconstruction will appear in the viewer on the right. You can rotate, pan, and zoom to explore the model, and download the GLB file. Note the visualization of 3D points may be slow for a large number of input images.
  5. Adjust Visualization (Optional): After reconstruction, you can fine-tune the visualization using the options below
    (click to expand):
    • Confidence Threshold: Adjust the filtering of points based on confidence.
    • Show Points from Frame: Select specific frames to display in the point cloud.
    • Show Camera: Toggle the display of estimated camera positions.
    • Filter Sky / Filter Black Background: Remove sky or black-background points.
    • Select a Prediction Mode: Choose between "Depthmap and Camera Branch" or "Pointmap Branch."

Please note: StreamVGGT typically reconstructs a scene in less than 1 second. However, visualizing 3D points may take tens of seconds due to third-party rendering, which are independent of StreamVGGT's processing time.

3D Reconstruction (Point Cloud and Camera Poses)

Please upload a video or images, then click Reconstruct.

Select a Prediction Mode
0 100
Show Points from Frame

Click any row to load an example.

Examples
num_images Upload Images Confidence Threshold (%) Filter Black Background Filter White Background Show Camera Filter Sky Select a Prediction Mode is_example