Generalization in Multi-Modal Language Learning from Simulation