Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation

IEEE Robotics and Automation Letters (RA-L)
1Zhejiang University, 2University of Science and Technology of China, 3Shanghai Jiao Tong University
*Equal contribution, Corresponding author
Prior candidate-based pose pipelines versus Flow6D.
Prior methods rely on brute-force candidate sampling and ranking (a). Flow6D (b) localizes a discrete latent space and then regresses a continuous pose — higher accuracy, much faster.

Abstract

6D pose estimation is a key task in computer vision and embodied AI. Existing methods directly regress in a high-dimensional continuous space, facing two key challenges: limited accuracy due to noise and local optima, and inefficient search over an infinite space that hinders real-time performance. We propose Flow6D, a hierarchical flow matching framework with a two-stage discrete latent space localization to continuous pose regression strategy. A discrete flow matching model first locks the latent space around the true pose to reduce search complexity; a continuous flow matching model then predicts local residuals to regress an accurate pose. The framework naturally extends to articulated objects, outperforming state-of-the-art methods on synthetic and real datasets with real-time inference at 70 FPS.

Discrete-to-Continuous

Discrete flow matching localizes the true-pose latent space, while continuous flow matching refines residuals, reducing the local-optimum and noise sensitivity of direct regression.

Real-Time at 70 FPS

A compact structured search replaces costly brute-force candidate ranking, running in ~11 ms per frame on a single RTX 4090 .

Rigid & Articulated

One unified framework for both rigid and multi-part articulated objects, generalizing robustly under occlusion, clutter, and illumination changes.

Method

Two-stage Flow6D framework.
Two-stage framework. Stage I selects an anchor pose via discrete flow matching over uniformly sampled rotation/translation bins. Stage II refines it via continuous flow matching with adaptive latent pose sampling.

Flow6D turns 6D pose estimation into a two-stage, discrete-to-continuous process:

Results

Rigid objects — REAL275

Using only depth input (no category prior), Flow6D sets a new state of the art while running ~5× faster than the diffusion-based GenPose.

Quantitative comparison on the REAL275 dataset.
Quantitative comparison on REAL275 for category-level pose estimation.
Qualitative results on the real-world REAL275 dataset.
REAL275 qualitative results. Red and green 3D boxes are ground truth and our predictions.

Articulated objects — ArtImage

Flow6D achieves the lowest per-part rotation and translation errors while running two-to-three orders of magnitude faster than optimization-based baselines.

Comparison with state-of-the-art methods on the ArtImage dataset.
Comparison on ArtImage across Laptop, Eyeglasses, Dishwasher, Scissors, and Drawer categories.
Qualitative results on the ArtImage dataset.
ArtImage qualitative results — accurate per-part poses even at joints and under partial occlusion.

Real-world experiments

We evaluate Flow6D on rigid and articulated objects in real-world scenes. The videos show stable pose tracking through object motion, interaction, and articulation under changing viewpoints and partial occlusion.

Rigid objects

Tasks: mug pick-and-place and cross-container pouring.

Black-mug pick-and-place
Yellow-mug pick-and-place
Bottle-to-bowl pouring
Bottle-to-mug pouring
Bottle-to-mug pouring · different height
Mug-to-bottle pouring

Articulated objects

Tasks: laptop opening and closing.

Laptop closing · sequence 1
Laptop closing · sequence 2
Laptop opening · sequence 1
Laptop opening · sequence 2

BibTeX

@article{mei2026flow6d,
title={Flow6D: Discrete-to-Continuous Flow Matching for Efficient and Accurate Category-Level 6D Pose Estimation},
author={Mei, Mingyu and Zhang, Li and Dai, Zibo and Sun, Han and Zhao, Xinyue and Shen, Huiliang and He, Zaixing},
journal={IEEE Robotics and Automation Letters},
year={2026},
publisher={IEEE}
}