
SWE-bench-Live
Evaluating your AI system on latest software engineering tasks.
About
SWE-bench-Live is a live benchmark for issue resolving, designed to evaluate an AI system's ability to complete real-world software engineering tasks. Thanks to our automated dataset curation pipeline, we plan to update SWE-bench-Live on a monthly basis to provide the community with up-to-date task instances and support rigorous and contamination-free evaluation.
News
RepoLaunch update
We upgraded RepoLaunch Agent to support building repos on all mainstram languages (C C++ C# Python Java Go JS/TS Rust) and on both Linux&Windows platforms. We added test log parsing functionalities so test log parsing does not depend on pytest any more! We also added minimal rebuild command generation for languages that require resolving dependencies and compiling again after code-fix for automated test. SWE-bench-Live-MultiLang will be released soon following this major advancement! For RepoLaunch preview, please refer to RepoLaunch_Preview.
Dataset update (through Aug 2025)
We've finalized the update process for SWE-bench-Live: Each month, we will add 50 newly verified, high-quality issues to the dataset. The lite and verified splits will remain frozen, ensuring fair leaderboard comparisons and keeping evaluation costs manageable. To access the latest issues, please refer to the full split!
Dataset update
We've updated the dataset! Now it includes 1,565 task instances, covering 164 repositories.
Initial dataset releasing
The initial release of SWE-bench-Live includes 1,319 latest (created after 2024) task instances, each paired with an instance-level Docker image for test execution, covering 93 repositories.
Leaderboard
Rank | Method | Resolved ↓ | Date ↕ |
---|
Loading leaderboard data...
Submit your results
We coordinate results submission via Pull Requests, see SWE-bench-Live/submissions for instructions.
Acknowledgement
SWE-bench-Live is built upon the foundation of SWE-bench. We extend our gratitude to the original SWE-bench team for their pioneering work in software engineering evaluation benchmarks.
Citation
If you use SWE-bench-Live in your research, please cite:
@article{zhang2025swebenchgoeslive, title={SWE-bench Goes Live!}, author={Linghao Zhang and Shilin He and Chaoyun Zhang and Yu Kang and Bowen Li and Chengxing Xie and Junhao Wang and Maoquan Wang and Yufan Huang and Shengyu Fu and Elsie Nallipogu and Qingwei Lin and Yingnong Dang and Saravan Rajmohan and Dongmei Zhang}, journal={arXiv preprint arXiv:2505.23419}, year={2025} }