Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

Published in , 1900

VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration

Published in ACL 2025 Findings, 2025

This paper introduces VSCBench, a comprehensive benchmark for evaluating safety calibration in vision-language models.

Recommended citation: J Geng, Q Li, Z Chen, Y Wang, D Zhu, Z Xie, C Lyu, X Chen, P Nakov, et al. (2025). "VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration." ACL 2025 Findings.
Download Paper

Marco-Bench-MIF: On multilingual instruction-following capability of large language models

Published in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 2025

This paper presents Marco-Bench-MIF, a benchmark for evaluating multilingual instruction-following capabilities of large language models.

Recommended citation: B Zeng, C Lyu, S Liu, M Zeng, M Wu, X Ni, T Shi, Y Zhao, Y Liu, C Zhu, et al. (2025). "Marco-Bench-MIF: On multilingual instruction-following capability of large language models." Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics.
Download Paper

CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

Published in arXiv preprint, 2025

This paper presents CoQuIR, a comprehensive benchmark for evaluating code quality-aware information retrieval systems.

Recommended citation: J Geng, F Cai, S Cui, Q Li, L Chen, C Lyu, H Li, D Zhu, W Pretschner, et al. (2025). "CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval." arXiv preprint arXiv:2506.11066.
Download Paper

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation

Published in Findings of the Association for Computational Linguistics: EMNLP 2025, 2025

This paper introduces CaMMT, a benchmark for evaluating culturally aware multimodal machine translation systems.

Recommended citation: E Villa-Cueva, S Bolatzhanova, D Turmakhan, J Geng, et al. (2025). "CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation." Findings of the Association for Computational Linguistics: EMNLP 2025.
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.