Data Science and Research Software Engineering Collaboration

Data Science and Software Engineering play an important role in research by creating new capabilities to process and analyze data, helping ensure reproducibility,and aiding researchers in extracting knowledge and insight for the data.  The term software here is used broadly to include all the ways in which one creates and analyses data. Researchers utilize software in their research by using scripts, tools, open-source software, and licensed software.  Data science also covers a wide range of skills and techniques applied to cleaning (aka wrangling), processing, and statistics that are typically beyond what a researcher from a specific domain might have. Due to the rapidly evolving nature of research, there are not always codes for all functions needed, nor are their clean data sources; therefore, the software or data pipelines are developed specifically for a given project. Traditionally, this development was done with researchers (graduate students and postdocs) or independent contractors. This approach poses several issues in terms of maintenance, optimization, reproducibility, and cost.  RSE or Data Scientist team can work closely with other Research Computing Systems teams to design, develop, deploy, optimize, and maintain software packages/tools and data pipelines that are paired with specific hardware architectures to accelerate cutting-edge research at Harvard University.

Eligibility information is outlined below based on providers with offerings that are available to the entire Harvard community or a specific unit/appointment. 

University-wide

Faculty of Arts and Sciences, Research Computing

The following services are offered by the RSE team:

  • Development of scientific software packages
  • Development of functional and robust UI/UX
  • Add critical features to existing codebases
  • Maintenance of the current codebases developed by researchers
  • Development of Machine Learning/Big Data/Deep Learning apps and platforms
  • Development of data acquisition and analysis automation platform
  • Improve the performance of existing software packages
  • Complex database design and deployment

Tier 1: Free small single tasks  
Tier 2: Individual Project, defined SOW start-end-dates  
Tier 3: Product, on-going development and operations, SLA  

Audience

Available to all research groups with FASRC account.

Service Provider

Faculty of Arts and Sciences, Research Computing (FASRC)

Service Fee

None

Service Website

All RSE requests by booking a consultation appointment at https://www.rc.fas.harvard.edu/consulting-calendar/ or emailing rchelp@rc.fas.harvard.edu.

Contact Information

Contact Mahmood Shad at rchelp@rc.fas.harvard.edu 

Institute for Quantitative Social Science

Extended support over the lifecycle of a research project by embedding a data science specialist in your research team. We can design and implement a data analysis pipeline for many stages of your research project, and/or develop a prototype of your research focused software tool. Specifically, we can help with the following:

  • Writing reproducible, version-controlled code (R, Python, C, C++)
  • Data organization and cleaning
  • Model estimation and post-estimation
  • Visualization of raw data and model output Interpretation of results
  • Writing methods and results sections of papers
  • Responding to peer-reviews of our analyses
  • Developing tool prototypes in R / Python

Audience

All Social Science researchers

Service Provider

Institute for Quantitative Social Sciences

Service Fee

$100/hr

Service Website

https://www.iq.harvard.edu/data-science-services 

Contact Information

Contact Steve Worthington at help@iq.harvard.edu

Unit/Appointment-specific

Harvard Business School

The following services are offered by the RSE team

  • Development of scientific software codes 
  • Add critical features to existing codebases  
  • Maintenance of the current codebases developed by researchers  
  • Development of Machine Learning/Big Data/Deep Learning codes  
  • Development of data acquisition and analysis automation  
  • Improve the performance of existing software codes 
  • Complex database design and deployment  

Tier 1: Free small single tasks  
Tier 2: Individual Project, defined SOW start-end-dates  
Tier 3: Product, on-going development and operations, SLA  

Audience

Available to all research groups at HBS

Service Provider

Harvard Business School

Service Fee

None

Service Website

None

Contact Information

Contact Bob Freeman at research@hbs.edu 

Quantitative Biomedical Research Center (Harvard Chan School)

The Quantitative Biomedical Research Center can assist with:

  • Software design, implementation, optimization, refactoring, and maintenance: Every phase of the software development life cycle from pre-publication to post-publication
  • Cloud computing: From budgeting for grant applications to implementing, securing, and maintaining infrastructure as code
  • Data management: Storage and sharing of large ‘omics data sets
  • Data visualization: Interactive dashboards using R Shiny and other tools
  • Software package management: Containers or Conda packages for your software tools to facilitate distribution
  • Customized Analysis Pipeline Construction: Leveraging the Center's Cloud Native Application Platform (CNAP), this customized solution offers fast deployment, easy dissemination, and the ability to process large data volume in a secure and highly reproducible environment.
  • Many other areas of scientific computing: HPC, web development, etc.  

Audience

Harvard Community, other academic institutions, and industry

Service Provider

Quantitative Biomedical Research Center 

Service Fee

From $145/hour

Service Website

https://www.hsph.harvard.edu/qbrc/services/

Contact Information

qbrc@hsph.harvard.edu