Tristan Miller

Department of Computer Science · University of Manitoba
+1 204 474 8313 tristan@logological.org ()

I'm a computational linguist with research interests in lexical semantics, historical online corpora, and computational detection and interpretation of humour.



Publications

Liana Ermakova, Anne-Gwenn Bosser, Adam Jatowt, and Tristan Miller.
The JOKER Corpus: English–French parallel data for multilingual wordplay recognition.
In SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2796–2806, New York, NY, July 2023. Association for Computing Machinery. ISBN 978-1-4503-9408-6. DOI: 10.1145/3539618.3591885.
Despite recent advances in information retrieval and natural language processing, rhetorical devices that exploit ambiguity or subvert linguistic rules remain a challenge for such systems. However, corpus-based analysis of wordplay has been a perennial topic of scholarship in the humanities, including literary criticism, language education, and translation studies. The immense data-gathering effort required for these studies points to the need for specialized text retrieval and classification technology, and consequently for appropriate test collections. In this paper, we introduce and analyze a new dataset for research and applications in the retrieval and processing of wordplay. Developed for the JOKER track at CLEF 2023, our annotated corpus extends and improves upon past English wordplay detection datasets in several ways. First, we introduce hundreds of additional positive examples; second, we provide French translations for the examples; and third, we provide negative examples with characteristics closely matching those of the positive examples. This last feature helps ensure that AI models learn to effectively distinguish wordplay from non-wordplay, and not simply texts differing in length, style, or vocabulary. Our test collection represents then a step towards wordplay-aware multilingual information retrieval.
@inproceedings{ermakova2023joker,
author       = {Liana Ermakova and Anne-Gwenn Bosser and Adam Jatowt and Tristan Miller},
title        = {The {JOKER} {Corpus}: {English}--{French} Parallel Data for Multilingual Wordplay Recognition},
booktitle    = {{SIGIR} '23: Proceedings of the 46th {International} {ACM} {SIGIR} {Conference} on {Research} and {Development} in {Information} {Retrieval}},
pages        = {2796--2806},
month        = jul,
year         = {2023},
publisher    = {Association for Computing Machinery},
address      = {New York, NY},
isbn         = {978-1-4503-9408-6},
doi          = {10.1145/3539618.3591885},
}
Liana Ermakova, Tristan Miller, Anne-Gwenn Bosser, Victor Manuel Palma Preciado, Grigori Sidorov, and Adam Jatowt.
Science for fun: The CLEF 2023 JOKER track on automatic wordplay analysis.
In Jaap Kamps, Lorraine Goeuriot, Fabio Crestani, Maria Maistro, Hideo Joho, Brian Davis, Cathal Gurrin, Udo Kruschwitz, and Annalina Caputo, editors, Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, Proceedings, Part III, volume 13982 of Lecture Notes in Computer Science (ISSN 0302-9743), pages 546–556, Berlin, Heidelberg, April 2023. Springer. ISBN 978-3-031-28241-6. DOI: 10.1007/978-3-031-28241-6_63.
Understanding and translating humorous wordplay often requires recognition of implicit cultural references, knowledge of word formation processes, and discernment of double meanings – issues which pose challenges for humans and computers alike. This paper introduces the CLEF 2023 JOKER track, which takes an interdisciplinary approach to the creation of reusable test collections, evaluation metrics, and methods for the automatic processing of wordplay. We describe the track's interconnected shared tasks for the detection, location, interpretation, and translation of puns. We also describe associated data sets and evaluation methodologies, and invite contributions making further use of our data.
@inproceedings{ermakova2023science,
author       = {Liana Ermakova and Tristan Miller and Anne-Gwenn Bosser and Victor Manuel {Palma Preciado} and Grigori Sidorov and Adam Jatowt},
editor       = {Jaap Kamps and Lorraine Goeuriot and Fabio Crestani and Maria Maistro and Hideo Joho and Brian Davis and Cathal Gurrin and Udo Kruschwitz and Annalina Caputo},
title        = {Science for Fun: The {CLEF} 2023 {JOKER} Track on Automatic Wordplay Analysis},
booktitle    = {Advances in Information Retrieval: 45th {European} {Conference} on {Information} {Retrieval}, {ECIR} 2023, {Dublin}, {Ireland}, {April} 2–6, Proceedings, Part~{III}},
volume       = {13982},
pages        = {546--556},
series       = {Lecture Notes in Computer Science},
month        = apr,
year         = {2023},
publisher    = {Springer},
address      = {Berlin, Heidelberg},
isbn         = {978-3-031-28241-6},
issn         = {0302-9743},
doi          = {10.1007/978-3-031-28241-6_63},
}
Waltraud Kolb and Tristan Miller.
Human–computer interaction in pun translation.
In James Luke Hadley, Kristiina Taivalkoski-Shilov, Carlos S. C. Teixeira, and Antonio Toral, editors, Using Technologies for Creative-Text Translation, pages 66–88. Routledge, 2022. ISBN 9781003094159. DOI: 10.4324/9781003094159-4.
We present and evaluate PunCAT, an interactive electronic tool for the translation of puns. Following the strategies known to be applied in pun translation, PunCAT automatically translates each sense of the pun separately; it then allows the user to explore the semantic fields of these translations in order to help construct a plausible target-language solution that maximizes the semantic correspondence to the original. Our evaluation is based on an empirical pilot study in which the participants translated puns from a variety of published sources from English into German, with and without PunCAT. We aimed to answer the following questions: Does the tool support, improve, or constrain the translation process, and if so, in what ways? And what are the tool's main benefits and drawbacks as perceived and described by the participants? Our analysis of the translators' cognitive processes gives us insight into their decision-making strategies and how they interacted with the tool. We find clear evidence that PunCAT effectively supports the translation process in terms of stimulating brainstorming and broadening the translator's pool of solution candidates. We have also identified a number of directions in which the tool could be adapted to better suit translators' work processes.
@incollection{kolb2022human,
author       = {Waltraud Kolb and Tristan Miller},
editor       = {James Luke Hadley and Kristiina Taivalkoski-Shilov and Carlos S. C. Teixeira and Antonio Toral},
title        = {Human--Computer Interaction in Pun Translation},
booktitle    = {Using Technologies for Creative-Text Translation},
pages        = {66--88},
year         = {2022},
publisher    = {Routledge},
isbn         = {9781003094159},
doi          = {10.4324/9781003094159-4},
}
Netizens, Michael and Ronda Hauben's foundational treatise on Usenet and the Internet, was first published in print 25 years ago. In this piece, we trace the history and impact of the book and of Usenet itself, contextualising them within the contemporary and modern-day scholarship on virtual communities, online culture, and Internet history. We discuss the Net as a tool of empowerment, and touch on the social, technical, and economic issues related to the maintenance of shared network infrastructures and to the preservation and commodification of Usenet archives. Our interview with Ronda Hauben offers a retrospective look at the development of online communities, their impact, and how they are studied. She recounts her own introduction to the online world, as well as the impetus and writing process for Netizens. She presents Michael Hauben's conception of “netizens” as contributory citizens of the Net (rather than mere users of it) and the “electronic commons” they built up, and argues that this collaborative and collectivist model has been overwhelmed and endangered by the privatisation and commercialisation of the Internet and its communities.
@article{miller2022remembering,
author       = {Tristan Miller and Camille Paloque-Bergès and Avery Dame-Griff},
title        = {Remembering {Netizens}: {An} Interview with {Ronda} {Hauben}, Co-Author of {Netizens}: {On} the History and Impact of {Usenet} and the {Internet} (1997)},
journal      = {Internet Histories: Digital Technology, Culture and Society},
volume       = {7},
number       = {1},
pages        = {76--98},
year         = {2022},
issn         = {2470-1483},
doi          = {10.1080/24701475.2022.2123120},
}

Projects

Funded research projects

Events & organizations

Software

Publishing & documentation


Miscellany

My interests in language, math, and computers were sparked and strengthened by exposure to the works of Willard R. Espy, Louis Phillips, Mike Keith, Dmitri Borgmann, Jim Butterfield, and others. These writers share a great talent for making technical or linguistic topics fun and accessible to a general audience. You can check out my own contributions to popular and recreational mathematics and linguistics, plus a few other odds and ends.

I also maintain an index of miscellaneous documents and websites I've produced which don't really fit into any other section.