🏆 ScreenSpot (v2) Leaderboard 🏆

github

📝 Notes

  1. Evaluated using HumanEval+ version 0.1.10; MBPP+ version 0.2.0.
  2. Models are ranked according to pass@1 using greedy decoding. Setup details can be found here.