Making Transparency Work for Harvard's Dataverse Project

Making research data open and available is easier when we all work transparently.

Philip DurbinA culture of transparency permeates the Dataverse project, contributing to its adoption in dozens of research institutions around the world. Headquartered at Harvard University, the Dataverse development team has more than a decade of experience operating as an open source project within an organization that values transparency: the Institute of Quantitative Social Science (IQSS). Working transparently helps the Dataverse team communicate changes to current development efforts, provides opportunities for the community to support each other, and facilitates contribution to the project.

Dataverse is open source research data repository software, a platform for sharing and exploring research data. In June 2007, Dataverse developers published the first open source commit to the project, but precursors to Dataverse date back to 1987. With help from the community, the Dataverse development team completed a code rewrite in 2016, which led to a significant growth in adoption. As of 2017, 26 institutions around the world run Dataverse in production and three installations, including Harvard Dataverse, offer data hosting to any researcher in the world.

Transparency from top to bottom

The Dataverse project emerged from IQSS, an organization that promotes visibility into its various operations. The IQSS webpage of roadmaps indicates the institute's level of commitment to transparency, stating:

We maintain these development roadmaps publicly so that all our faculty, students, and staff can remain on, or at least work from, the same page; we give everyone complete visibility into what IQSS is working toward, how we are going about it, and when we plan to get there. For each area of development, you will find the big picture and ways to drill down to whatever level of detail you desire; for some, you will find ongoing community discussion forums, and in many you can even see the raw computer code we are in the process of writing. We realize that this level of transparency is highly unusual in a large complex environment like Harvard, but our research indicates that we do a much better job when involving our community in our operations and empowering its members to fuel the growth and improvement of our products, services, and activities.

Support for transparency from its parent organization helps the Dataverse team feel comfortable opening up. The project's public roadmap provides an overview with links to dive into fine detail of any particular feature or bug. The roadmap also enumerates the project's strategic goals, helping set expectations with the community regarding project priorities. The team communicates changes to the roadmap on the public "dataverse-community" mailing list as well as on biweekly community calls. (The calls are not recorded but participants collaborate on taking notes and send them to the public mailing list.) At the annual community meeting, the team presents the roadmap and reflects on accomplishments from the past year. Planned releases on the roadmap link to a public kanban board, displaying the status of various issues as they move from the backlog to development, code review, and QA. In short, development is an open book, and the community can follow along with every chapter and even help tell the story.

Transparency in support

As adoption of Dataverse has grown, the community has become better able to support itself. Community members ask and answer questions in public channels, building a knowledge base accessible by any search engine. The team encourages the community to be bold about posting questions to the mailing list and the publicly logged IRC channel. The community is eager to help all members succeed, and the strengths of individual community members shine through.

Transparency in contribution

Increasing contribution is one of the strategic goals of the Dataverse project and the team actively asks questions that encourage publicly contributing to the project such as the following.

  • "Can you please open an issue?"
  • "Are you interested in making a pull request?"
  • "Can you please participate in a community call so we can hear more about your idea?"
  • "Can you please start a thread about your idea on the mailing list?"

The community contributes by providing ideas, improving documentation, participating in usability tests, and writing code. Recently the team started tracking community development efforts on a public spreadsheet so that it's clear who is working on what and can provide status updates on various initiatives. As members of the community make progress, an issue number is added to the spreadsheet, followed by a pull request number. By having conversations in the open, the community keeps abreast of current efforts of their counterparts at other institutions, or even the same institutions.

Challenges in transparency

Transparency is not without its challenges. Open source newcomers, both internal and external to the Dataverse project, can find a high level of transparency scary. What if people don't like my code or my design? What if the issue I open is a false alarm? What if my problem is due to something I did wrong? Nagging doubts like this are normal and as a community we must constantly remind each other that we'd rather hear an imperfect idea than nothing at all.

Security deserves special mention in the context of transparency. Like many projects, Dataverse has a private email address for receiving reports of suspected security vulnerabilities. It would be irresponsible to put customers at risk with completely open discussion of security concerns.

Transparency in design presents some challenges as well. In a talk at Ohio Linux Fest 2017, Máirín Duffy from Red Hat explains "the big reveal" from design culture and how the "release early and often" mantra heard so often in open source can be difficult for designers who prefer sharing curated, polished designs. Lately the Dataverse project has been posting unpolished designs on a separate kanban board that is public but not announced. Mockups from the board appear in usability tests and are refined until they are ready to be included in a development sprint.

Results

A positive response from the Dataverse community during a retrospective at the most recent community meeting has encouraged the team to continue working transparently. The community loves the community calls and only wants more notice about agenda topics and reminders to call in. They appreciate the open roadmap. They asked for issues to be flagged with "help wanted" if there is a way they can contribute. They asked for assistance understanding the system used to write documentation so they can help. In short, more information is better. Putting information into the open—rather than in private emails and messages—maximizes the value of our keystrokes.

This article is part of the Open Organization Workbook project.

Making Transparency Work for Harvard's Dataverse Project was authored by Philip Durbin and published in Opensource.com. It is being republished by Open Health News under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). The original copy of the article can be found here.