A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs with a Costly max-Oracle

Neel Shah, Vladimir Kolmogorov, and Christoph H. Lampert.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.


Structural support vector machines (SSVMs) are amongst the best performing models for many structured computer vision tasks, such as semantic image segmentation or human pose estimation. Training SSVMs, however, is computationally costly, since it requires repeated calls to a structured prediction subroutine (called max-oracle), which requires solving an optimization problem itself, e.g. a graph cut. In this work, we introduce a new technique for SSVM training that is more efficient than earlier techniques when the max-oracle is computationally expensive, as it is frequently the case in computer vision tasks. The main idea is to combine the recent stochastic Block-Coordinate Frank-Wolfe method with efficient hyperplane caching and to use an automatic selection rule for deciding whether to call the max-oracle or to rely on one of the cached hyperplanes. We show experimentally that this strategy leads to faster convergence to the optimum with respect to the number of requires oracle calls, and that this also translates into faster convergence with respect to the total runtime for cases where the max-oracle is slow compared to the other steps of the algorithm. A publicly available C++ implementation is provided.


[.pdf] arXiv version