HOTR is a novel framework which directly predicts a set of {human, object, interaction} triplets from an image using a transformer-based encoder-decoder. Through the set-level prediction, our method ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results