$ Eddie Berman / Publications / Cassano et al. (1) (Back)

We develop benchmarks to evaluate large language models on code editing performance, as previous benchmarks proved insufficient. We also fine-tune models specifically for code editing.