Rui Zhang

Assistant Professor
Computer Science and Engineering Department at Penn State University
W329 Westgate Building, University Park, PA 16802
Email:    rmz5227 __at__ psu.edu   

GoogleScholar Github


Dr. Rui Zhang is an Assistant Professor in the Computer Science and Engineering Department of Penn State University and a co-director of the PSU NLP Lab. His research interest lies in Natural Language Processing, Machine Learning, and Artificial Intelligence. He served as Area Chair at EMNLP 2022, NLPCC 2022, NAACL 2021, EMNLP 2021, NLPCC 2021. He received the 2020 Amazon Research Award. He holds B.S. degrees from both Shanghai Jiao Tong University and the University of Michigan in 2015 and received his Ph.D. from the Computer Science Department at Yale University in 2020. During his Ph.D., he has done research internships at IBM Thomas J. Watson Research Center, Grammarly Research, and Google AI.

Research Interests

I am broadly interested in Natural Language Processing, Machine Learning, and Artificial Intelligence, with special focus on

News

  • 5/2023: Serve as an Area Chair in NeurIPS 2023 and AACL 2023.
  • 2/2023: Our EvoquerBOT team is participating in the Amazon Alexa Prize TaskBot Challenge 2. "Alexa, Let's work together"!
  • 2/2023: Please join us on the panel on Knowledge, NLP, and LLM at the AAAI 2023 workshop on Knowledge Augmented Methods for NLP!
  • 1/2023: This semester I am teaching CSE 587 Deep Learninig for Natural Language Processing.
  • 12/2022: Give talks at University of Pennsylvania and Penn State CSE Semniar on Semantic Parsing in the Era of Large Language Models.
  • 11/2022: Congratulations to Jason for winning the CAFE AI Award for Best Undergraduate Honors Thesis!
  • 10/2022: Three papers on semantic parsing and structured knowledge accepted in EMNLP 2022.
  • 8/2022: Give invited talks at Amazon, The University of Tokyo, PSU REU Seminar, and MLNLP Seminar on Contrastive Learning for NLP: A Case Study in Few-shot Named Entity Recognition.
  • [Tutorial@NAACL2022] We are presenting a tutorial on Contrastive Data and Learning for Natural Language Processing at NAACL 2022 with Yangfeng Ji, Yue Zhang, Rebecca Passonneau.
  • [Workshop@NAACL2022] We are co-organizing the Workshop on Structured and Unstructured Knowledge Integration (SUKI) at NAACL 2022.
  • [Workshop@NAACL2022] We are co-organizing the Workshop on Multilingual Information Access (MIA) at NAACL 2022.
  • 6/2022: Congratulations to Yusen and Sarkar on winning the Dr. Tse-Yun Feng Graduate Student Award (outstanding RA award in the CSE department)!
  • 5/2022: Congratulations to graduate students starting their research internships at Amazon, Microsoft, and JPMorgan Chase!
  • 3/2022: Four papers accepted in ACL 2022 on Long-Text Summarization, Few-shot NER, Numerical Reasoning in Tables and Text. Congratulations to Yusen, Sarkar, Yilun, and all co-authors!
  • 1/2022: Welcome Haoran joining our lab!
  • 9/2021: Two papers accepted in Findings of EMNLP 2021. Congratulations to Yusen and all the co-authors!
  • 8/2021: Welcome Yusen, Nan, and Sarkar joining our lab!
  • 5/2021: Two papers (1 long and 1 long Findings) are accepted at ACL 2021.
  • 4/2021: Receive an Amazon Research Award to work on Conversational QA systems over Tables. Thanks Amazon!
  • 3/2021: DART is accepted at NAACL 2021.
  • 3/2021: Serving as an Area Chair in Summarization Track for EMNLP 2021 and NLPCC 2021.
  • 1/2021: SCoRe is accepted at ICLR 2021.
  • 12/2020: Serving as an Area Chair in Summarization Track for NAACL 2021.
  • 10/2020: Talk at UPenn CLunch.
  • 11/2020: We are organizing the workshop on Interactive and Executable Semantic Parsing (IntEx-SemPar 2020) located with EMNLP 2020. Please submit your work!

Publications

2023

XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations
Yusen Zhang, Jun Wang, Zhiguo Wang, Rui Zhang
ACL 2023   [paper] [code]

MACSum: Controllable Summarization with Mixed Attributes
Yusen Zhang, Yang Liu, Ziyi Yang, Yuwei Fang, Yulong Chen, Dragomir Radev, Chenguang Zhu, Michael Zeng, Rui Zhang
TACL 2023   [paper] [code]

ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive Pretraining
Haoran Zhang, Aysa Xuemo Fan, Rui Zhang
EACL 2023   [paper] [code]

Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023   [paper] [code]

2022

XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing
Peng Shi, Rui Zhang, He Bai, Jimmy Lin
Findings of EMNLP 2022   [paper] [code]

ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples
Yilun Zhao, Linyong Nan, Zhenting Qi, Rui Zhang, Dragomir Radev
EMNLP 2022   [paper] [code]

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Wu, Ming Zhong, Pengcheng Yin, Sida I Wang, Victor Zhong, Bailin Wang, Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu
EMNLP 2022   [paper] [code]

SummN: A Multi-Stage Summarization Framework for Long Input Dialogues and Documents
Yusen Zhang, Ansong Ni, Ziming Mao, Chen Henry Wu, Chenguang Zhu, Budhaditya Deb, Ahmed H. Awadallah, Dragomir Radev, Rui Zhang
ACL 2022   [paper] [code]

CONTaiNER: Few-Shot Named Entity Recognition via Contrastive Learning
Sarkar Snigdha Sarathi Das, Arzoo Katiyar, Rebecca J. Passonneau, Rui Zhang
ACL 2022   [paper] [code]

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data
Yilun Zhao, Yunxiang Li, Chenying Li, Rui Zhang
ACL 2022   [paper] [code]

DYLE: Dynamic Latent Extraction for Abstractive Long-Input Summarization
Ziming Mao, Chen Henry Wu, Ansong Ni, Yusen Zhang, Rui Zhang, Tao Yu, Budhaditya Deb, Chenguang Zhu, Ahmed H. Awadallah, Dragomir Radev
ACL 2022   [paper] [code]

Contrastive Data and Learning for Natural Language Processing
Rui Zhang, Yangfeng Ji, Yue Zhang, Rebecca J. Passonneau
Tutorial at NAACL 2022    [paper] [website]

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages
Akari Asai, Shayne Longpre, Jungo Kasai, Chia-Hsuan Lee, Rui Zhang, Junjie Hu, Ikuya Yamada, Jonathan H Clark, Eunsol Choi
Workshop on Multilingual Information Access (MIA) at NAACL 2022   [paper] [code]

FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Luke Benson, Lucy Sun, Ekaterina Zubova, Yujie Qiao, Matthew Burtell, David Peng, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Shafiq Joty, Alexander R Fabbri, Wojciech Kryscinski, Xi Victoria Lin, Caiming Xiong, Dragomir Radev
Preprint   [paper] [code]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
with many authors in the BIG-bench Team
TMLR 2023   [paper] [code]

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
with many authors in the BigScience Research Workshop
Preprint   [paper] [code]

FeTaQA: Free-form Table Question Answering
Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev
TACL 2022   [paper] [code]

2021

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut T. Kandemir, Chita R. Das
NeurIPS 2021   [paper]

An Exploratory Study on Long Dialogue Summarization: What Works and What's Next
Yusen Zhang*, Ansong Ni*, Tao Yu, Rui Zhang, Chenguang Zhu, Budhaditya Deb, Asli Celikyilmaz, Ahmed Hassan Awadallah, Dragomir Radev
*: Equal Contribution
Findings of EMNLP 2021   [paper]

TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation
Adaku Uchendu, Zeyu Ma, Thai Le, Rui Zhang, Dongwon Lee
Findings of EMNLP 2021   [paper]

Cross-language Sentence Selection via Data Augmentation and Rationale Training
Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuscakova, Rui Zhang, Douglas Oard, Kathleen McKeown
ACL 2021   [paper] [code]

Logic-Consistency Text Generation from Semantic Parses
Chang Shu*, Yusen Zhang*, Xiangyu Dong, Peng Shi, Tao Yu, Rui Zhang
*: Equal Contribution
Findings of ACL 2021   [paper] [code]

DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Murori Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
NAACL 2021   [paper] [code]

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing
Tao Yu,Rui Zhang, Oleksandr Polozov, Christopher Meek, Ahmed Hassan Awadallah
ICLR 2021   [paper]

2020 and before

ESPRIT: Explaining Solutions to Physical Reasoning Tasks
Nazneen Fatema Rajani*, Rui Zhang*, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming Xiong, Richard Socher, Dragomir Radev
*: Equal Contribution
ACL 2020   [paper] [code]

MATERIALizing Cross-Language Information Retrieval: A Snapshot
Petra Galuscakova, Douglas Oard, Joe Barrow, Suraj Nair, Shing Han-Chin, Elena Zotkina, Ramy Eskander, Rui Zhang
LREC 2020 Workshop on CLSSTS   [paper]

Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions
Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev
EMNLP 2019   [paper] [code]

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter Lasecki and Dragomir Radev
EMNLP 2019   [paper] [dataset and leaderboard]

Improving Low-Resource Cross-lingual Document Retrieval by Reranking with Deep Bilingual Representations
Rui Zhang, Caitlin Westerfield, Sungrok Shim, Garrett Bingham, Alexander Fabbri, Neha Verma, William Hu, Dragomir Radev
ACL 2019   [paper] [slides]

This Email Could Save Your Life: Introducing the Task of Email Subject Line Generation
Rui Zhang, Joel Tetreault
ACL 2019   [paper] [dataset]

SParC: Cross-Domain Semantic Parsing in Context
Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev
ACL 2019   [paper] [dataset and leaderboard]

Surprise Languages: Rapid-Response Cross-Language IR
with Douglas W. Oard, Petra Galuscakova, Kathleen McKeown, Dragomir Radev and many authors
EVIA 2019   [paper]

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander Fabbri, Irene Li, Dan Friedman, Dragomir Radev
AAAI 2019   [paper] [dataset]

Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, Dragomir Radev
EMNLP 2018   [paper] [dataset and leaderboard]

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, Dragomir Radev
EMNLP 2018   [paper] [code]

Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering
Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang, Dragomir Radev
ACL 2018   [paper]

Improving Text-to-SQL Evaluation Methodology
Catherine Finegan-Dollak, Jonathan K. Kummerfeld, Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, Dragomir Radev
ACL 2018   [paper] [code]

TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation
Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev
NAACL 2018   [paper] [code]

Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs
Rui Zhang, Honglak Lee, Lazaros Polymenakos, Dragomir Radev
AAAI 2018   [paper] [code]

Graph-based Neural Multi-Document Summarization
Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, Dragomir Radev
CoNLL 2017   [paper]

Effects of Creativity and Cluster Tightness on Short Text Clustering
Catherine Finegan-Dollak, Reed Coke, Rui Zhang, Xiangyi Ye, Dragomir Radev
ACL 2016   [paper]

Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents
Rui Zhang, Honglak Lee, Dragomir Radev
NAACL 2016   [paper]

Teaching

CSE 587 Deep Learning for Natural Language Processing, Spring 2022, Spring 2023.
CMPSC 448 Machine Learning, Fall 2020, Fall 2021, Fall 2022.
CMPSC 442 Artificial Intelligence, Spring 2021.

Talks

Workshop and Tutorial

Service

Funding

We thank Amazon AWS, Amazon Alexa, ebay, and Cisco for their supports.