Multimodal Q&A – Visual Dialog Task
The Visual Dialog task is a multimodal task that adds an image to a Q&A task that consists of a question and answer. For example, if you give a picture of a white cat and a black dog together and ask, "What color is the animal next to the cat?", you answer "black"...