Abstract: “Visual recognition involves reasoning about structured relations at multiple levels of detail. For example, human behaviour analysis requires a comprehensive labeling covering individual low-level actions to pair-wise interactions through to high-level events. Scene understanding can benefit from considering labels and their inter-relations. In this talk I will present recent work by our group building deep learning approaches capable of modeling these structures. I will present models for learning trajectory features that represent individual human actions, and hierarchical temporal models for group activity recognition. General purpose structured inference machines will be described, building from notions of message passing within graphical models. These will be used in models for inferring individual and group activity and modeling structured relations for image labeling problems.”