The Scenes With Occluded Regions Dataset (SWORD) contains around 1500 videos with 50 frames per video on average.
The dataset was obtained after processing the manually captured video sequences of static real-life urban scenes.
The main property of the dataset is the abundance of close objects and, consequently, larger prevalence of occlusions.
For each video, the poses form a trajectory where each pose specifies the camera position and orientation along the trajectory.
Figure 2: Samples from dataset with estimated occlusion masks for the given stereo pairs. See supplementary of the paper.