The great hope for vision-language AI models is that they will one day become capable of greater autonomy and versatility, ...