MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision (EMNLP 2020)


The lack of large and diverse discourse treebanks hinders the application of data-driven approaches, such as deep-learning, to RST-style discourse parsing. In this work, we present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets, creating and publishing MEGA-DT, a new large-scale discourse-annotated corpus. Our approach generates discourse trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient heuristic beam-search strategy, extended with a stochastic component. Experiments on multiple datasets indicate that a discourse parser trained on our MEGA-DT treebank delivers promising inter-domain performance gains when compared to parsers trained on human-annotated discourse corpora.

Source code



ACL Anthology


License Agreement


IMPORTANT! The Software/Dataset you seek to use is licensed only on the condition that YOU agree with The University of British Columbia to the terms and conditions set forth below.


If you do not agree to the terms of this agreement, delete and do not use the Software/Dataset.

1) License to use the UBC Software. The MEGA-DT software/dataset (the “Software”) you seek to use is licensed only on the condition that you ("YOU") agree with The University of British Columbia, a corporation continued under the University Act of British Col$

103 – 6190 Agronomy Road,
Vancouver, British Columbia,
V6T 1Z3

to the terms and conditions set forth below. UBC grants to YOU a non-exclusive, non-transferable, non-sublicensable right to use the Software on a single computer at a single location and on the terms and conditions set out in this Agreement, for internal tri$

2) Representation of Authority. YOU represent and warrant to UBC that YOU possess the legal authority to enter into this Agreement, and that YOU will be financially responsible for your use of the Software. You agree to be responsible for all license fees, co$

3) Confidential Information. YOU agree that the Software and any and all documentation, knowledge, know-how and/or techniques relating to the Software, is and will remain the sole and absolute property of UBC. YOU acknowledge that all documentation, trade-mar$

4) Use of Third Party Code. The Software may use or incorporate certain third party code libraries which UBC has obtained under various licenses or permissions. Information on the libraries, and where applicable, source code to the libraries, may be obtained $

5) No Warranty. YOU further acknowledge and agree that the Software is experimental in nature and is provided to YOU on an “as is” basis and for internal evaluation purposes only. UBC has no obligation to provide any services, modifications, upgrades, updates$

6) Limitation of liability. You agree that in no event shall UBC be liable to YOU or any third party for any indirect, consequential, incidental, punitive or special damages whatsoever, without regard to cause or theory of liability, or any damages (whether d$

7) Restrictions of Use. YOU SHALL NOT and will NOT authorize any third party to: Make copies of the Software, other than a single backup copy, and any such copy together with the original must be kept in YOUR possession or control. YOU shall reproduce and include the copyright notice of UBC on any backup copy; Reverse engineer, reproduce, derive source code, modify, improve, adapt, translate, decompile, disassemble, copy, translate into another computer language, create data or executable programs which mimic data or functionality in the Software, and/or create der$ Distribute, sell, resell, lease, transfer, loan, assign, trade, rent, publish or otherwise transfer the Software or any part thereof and/or copies thereof, to others; License or sublicense the use of the Software to others without the written permission of UBC; Use, without its express permission, the name of UBC or any trademark or logo of UBC in advertising, publicity, or otherwise; Use the Software, or permit use of the Software, or make the Software or any portion of it, in any form, available for use on the Internet, in a network, multi-user arrangement, remote access arrangement, including without limitation in circumstances where th$ Remove, disable or circumvent any security protections, proprietary notices or labels contained on or within the Software; and Export or re-export the Software or any copy or adaptation, whether in violation of any applicable laws or regulations or otherwise.

8) Indemnification. You agree to indemnify, defend and hold harmless UBC, its board of governors, officers, employees, faculty, students, staff or agents from and against any and all liability, loss, damage, action, claim or expense (including attorney’s fees$

9) Termination. YOU may terminate the license at any time by ceasing all use of the Software and destroying or deleting the Software (including the related documentation), together with all copies in any form. UBC may terminate this license immediately, and t$

10) Governing law. You agree this agreement shall be governed by, interpreted and construed in accordance with, the laws of the Province of British Comlumbia, and where applicable, the laws of Canada, without regard to any conflict of laws principles that wou$

11) Miscellaneous. No modification of this Agreement will be binding on the parties, unless in writing and signed by an authorized representative of each party. Should any provision of this Agreement be declared invalid or unenforceable, then such provision s$



PyTorch, Copyright (c) 2016-Facebook, Inc (Adam Paszke) obtained under the BSD licence, found here:

Two-Stage Parser, Copyright (c) 2019 Yizhong Wang, found here:

MILNet, Copyright (c) Stefanos Angelidis, found here:



If you use our dataset, code or any parts thereof, please cite this paper:

   title = "{MEGA} {RST} Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision",
   author = "Huber, Patrick and Carenini, Giuseppe",
   booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
   month = nov,
   year = "2020",
   address = "Online",
   publisher = "Association for Computational Linguistics",
   url = "",
   pages = "7442--7457"