Get Smart with ETL for GitHub

Emma Kessinger

May 29th , 2020

In January 2020, GitHub reportedly had over 40 million users and more than 100 million repositories. This makes GitHub the largest host of source code in the world. 

GitHub provides users with popular Git functionalities, such as distributed version control and source code management, as well as its own features like access control and collaboration tools. GitHub provides bug tracking, feature requests, task management, and wikis. 

With that much data being shared globally via GitHub, it’s important that collaborators have the ability to organize, understand, and pull insights from it. An ETL process can continuously get all your Github data into a data warehouse where trends can be mined over time. Your Github ETL process should capture the following data: 

Collaborators

A collaborator is someone on the core development team of the project and has commit access to the main repository of the project. By extracting all of that data into a single place, ETL tools let users take a deeper look at who is contributing what to the project. They can identify MVPs by the number of commits, lines added or changed, where those tweaks are made, and whether areas they’re working on are bug-free. 

Comments

For any Pull Request, GitHub provides three kinds of comment views: comments on the Pull Request as a whole, comments on a specific line within the Pull Request, and comments on a specific commit within the Pull Request. Collecting comments with ETL and categorizing them can help identify who is actively reviewing code and how often. 

Analyzing those discussions can prove useful in all sorts of ways. If comments rapidly grow in a certain area, developers can dedicate their resources to it. That, in turn, can clear up customer support issues stemming from that section of the codebase. 

Commits

Commits are code updates developers make to a project. Tracking commits can give a sense of who is contributing the most, and where they are putting their efforts.

Frequency, area, and type of maintenance have all sorts of product development implications. If commits go crazy in one area, is it because of a new feature, or is an old one acting up? If it’s the latter, it could be worth the company’s while to build a completely new component. 

Identifying squeaky wheels is also important for improving the user experience. The key is to compare commits to customer service issues: If hundreds of new commits are found in an area related to common complaints, it’s worth asking if recent commits might be the root cause. 

Pull Requests 

Pull requests are proposed changes to a repository submitted by a developer and accepted or rejected by a repository's collaborators. These clue other users into the fact that you’ve pushed changes to a certain branch of the software. Collecting data on pulls via ETL can help you see who’s asking for feedback and how often.

Analyzing pull requests helps hold contributors accountable. But more importantly, it ensures that your team is working together effectively. 

Pull Request Reviews 

Pull request reviews are comments from collaborators on a pull request that approve the changes or request further changes before the pull request is merged.  

ETL users can get similar insights from pull request reviews as they can pull requests. How frequently are collaborators providing feedback? 

Assignees

Often confused with a reviewer, an assignee is a person dedicated to working on a specific issue or pull request. By retrieving information about assignees through ETL, a company can understand who’s working on what issues and over what time frame. As with reviews, that’s critical for understanding workflows and hiring needs.

Github holds all sorts of insights for software developers. But without a tool like ETLrobot to continuously load all your data to your data warehouse, it’s tough to get a true understanding of a software’s performance, how well a team is working together, or the quality of committed code. And a high-level view of those things can make a real bottom-line difference. 

FROM
OUR BLOG

12 | May

3 Tips to Get More Value From Your Google Ads

Posted By: Emma Kessinger

Google Ads, not to be confused with Google Analytics, is one of the most helpful ETL integrations. G...

27 | Apr

Understanding the Legwork for Data Visualization 

Posted By: Emma Kessinger

The 21st century has been hailed as the “Age of Information,” and it’s not hard to see why —...

13 | Apr

The Anatomy of an Effective ETL Process

Posted By: Emma Kessinger

You know the value of ETL. You know you’re ready to invest in it. But you may not know how the rub...

2 | Apr

3 Ways ETL Can Strengthen Your Shopify Site

Posted By: Emma Kessinger

One of the most popular e-commerce tools out there is Shopify. But how do clients like Budweiser, Gy...

18 | Mar

5 Ways to Unlock New Value From HubSpot Data

Posted By: Emma Kessinger

No modern marketing platform is as popular as HubSpot. But without processes like ETL, it’s tough ...

3 | Mar

How to Maximize Your Qualtrics ETL Integration

Posted By: Emma Kessinger

Qualtrics offers a customizable survey software solution. With more than 9,000 clients, Qualtrics he...

18 | Feb

How to Use Five9’s ETL Integration to the Fullest

Posted By: Emma Kessinger

Cloud contact centers are the future of customer service. But without an ETL tool like ETLrobot, you...

6 | Feb

7 Questions For Finding the Right ETL Tool For You

Posted By: Emma Kessinger

ETL — which stands for extract, transform, and load — is one of the most common ways for busines...

20 | Jan

8 Data Security Questions to Ask For Your Business

Posted By: Emma Kessinger

Businesses that deal in physical goods go to great lengths to protect their products, so why shouldn...

9 | Jan

What Dirty Data Looks Like

Posted By: Emma Kessinger

Companies are being forced to process and parse more data than ever, and that kind of deluge can lea...

21 | Dec

5 Signs That It’s Time to Invest in ETL

Posted By: Emma Kessinger

How much more data does your business generate than it did in 2016? Twice as much? Ten times as much...

10 | Dec

Do More With Data: 4 Reasons to Use ETLrobot

Posted By: Emma Kessinger

By 2020, the Big Data market is projected to grow to twice the size it was just five years ago. Inve...

5 | Dec

6 Data Skills Every Employee Should Have 

Posted By: Emma Kessinger

In 2017, The Economist ruled that data has become the world’s most valuable commodity, even beatin...

Copyright © 2020 ETLrobot. All rights reserved. Privacy Terms