Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Ori Shapira; David Gabay; Yang Gao; Hadar Ronen; Ramakanth Pasunuru; Mohit Bansal; Yael Amsterdamer; Ido Dagan

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

Original language	English
Title of host publication	Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Publisher	Association for Computational Linguistics
Pages	682-687
Number of pages	6
Volume	1
Publication status	Published - Jun 2019

Access to Document

https://aclweb.org/anthology/papers/N/N19/N19-1072/

Cite this

Shapira, O., Gabay, D., Gao, Y., Ronen, H., Pasunuru, R., Bansal, M., Amsterdamer, Y., & Dagan, I. (2019). Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1, pp. 682-687). Association for Computational Linguistics. https://aclweb.org/anthology/papers/N/N19/N19-1072/

@inproceedings{652c51e6a6c84fd8a9d8f08d1029a13f,

title = "Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation",

abstract = "Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.",

author = "Ori Shapira and David Gabay and Yang Gao and Hadar Ronen and Ramakanth Pasunuru and Mohit Bansal and Yael Amsterdamer and Ido Dagan",

year = "2019",

month = jun,

language = "English",

volume = "1",

pages = "682--687",

booktitle = "Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",

publisher = "Association for Computational Linguistics",

}

Shapira, O, Gabay, D, Gao, Y, Ronen, H, Pasunuru, R, Bansal, M, Amsterdamer, Y & Dagan, I 2019, Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation. in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. vol. 1, Association for Computational Linguistics, pp. 682-687. <https://aclweb.org/anthology/papers/N/N19/N19-1072/>

Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation. / Shapira, Ori; Gabay, David; Gao, Yang et al.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1 Association for Computational Linguistics, 2019. p. 682-687.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation

AU - Shapira, Ori

AU - Gabay, David

AU - Gao, Yang

AU - Ronen, Hadar

AU - Pasunuru, Ramakanth

AU - Bansal, Mohit

AU - Amsterdamer, Yael

AU - Dagan, Ido

PY - 2019/6

Y1 - 2019/6

N2 - Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

AB - Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

M3 - Conference contribution

VL - 1

SP - 682

EP - 687

BT - Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

PB - Association for Computational Linguistics

ER -