This is a demo of Visual Dialog, accompanying the CVPR 2017 paper, hosted on CloudCV.
Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the agent has to answer the question.
Built on the Late Fusion architecture specified in our CVPR 2017 paper. The model was trained on VisDial v1.0 train+val. It uses ResNeXt detector features, and Pythia for captioning. Code available here.
For more details about the dataset, task and models, please visit visualdialog.org.