The collection of low-resource language data for AI technology development has always remained difficult, but the exponentially increasing need for data for the current paradigm of training large language models can further marginalize these languages. Even with some community-driven data collection methods developed, there are various ethical issues to be considered, given that many of these languages are spoken in the Global South, where such technology development might not always benefit, or could even be potentially harmful to the communities.

This workshop invites researchers with diverse experiences—including but not limited to community-based research, low-resource language technology development, data collection and analysis—as well as practitioners, developers, and community representatives to join us in discussing potential pathways for community-driven data practices for low-resource language technologies and the ethical challenges associated with them. Our goal is for this workshop to serve as a platform for sharing ideas and fostering collaboration.

If you are interested in attending, please fill out this Google Form to help us prepare.

For more details, you can read our full workshop proposal.

Workshop Schedule

The workshop is scheduled on July 23rd (Wednesday) from 2:00 PM to 6:00 PM. It will be held hybrid, in-person in Toronto and online on Zoom. Please refer to the COMPASS 2025 website for details on the in-person location.

All times are Eastern Time (ET).

Time Activity
2:00-2:30 PM Welcome, quick introductions, and ice-breaker ideation session
2:30-3:30 PM Panel discussion
3:30-3:45 PM Break
3:45-4:45 PM Breakout sessions: small-group discussions on case studies
4:45-5:15 PM Report-back and synthesis
5:15-6:00 PM Closing, next steps, and open networking

Panelists

Organizers

You can reach out to us at comdataworkshop@gmail.com.