Visual Dialog is a novel task that requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Given an image, dialog history, and a follow-up question about the image, the agent has to answer the question.
Built on the Late Fusion architecture specified in our CVPR 2017 paper. The model was trained on VisDial v1.0 train+val. It uses ResNeXt detector features, and Pythia for captioning. Code available here.
For more details about the dataset, task and models, please visit visualdialog.org.
Terms of Service
Text or media that you upload on the demo will be stored and used for research purposes.