How To Scrape GitHub Profiles to Find Top Talent in 30 Seconds

Marc Howard
4 min readAug 8, 2019

The most talented developers in the world can be found on GitHub. What if there was an easy, fast and free way to find, rank and recruit them? I’ll show you exactly how to do this in less than a minute using free tools and a process that I’ve hacked together to vet top tech talent at BizPayO.

For many of the side-projects, I work on collaborating with the best developers in machine learning/AI, blockchain and cryptocurrency are often a matter of checking out the most popular projects on GitHub then helping each other with open-source projects.

This is a short tutorial on how to scrape data on the top developers from GitHub and export quickly into a spreadsheet.

Use Cases:

  • Find the most talented developers to collaborate with (i.e. most-followed JavaScript developers in San Francisco, CA)
  • Recruit top developers based on their skillset and collaboration activity

The best part is that these free tools are not limited to just GitHub. You can use them for these other use cases:

  • Finding the best products and prices on Amazon or eBay
  • Getting business contact details from YP or Yelp (i.e. building an outreach list with highly rated successful businesses)

Let’s Begin

In this example we’ll scrape GitHub to find the names, location, and if provided email for the most followed JavaScript developers in San Francisco.

You’ll only need two Chrome extensions, Autopagerize, and Instant Data Scraper, both free.

Autopagerize simply allows you to auto-load any paginated website. It works in all major browsers including Firefox, Chrome, Opera, and Safari.

Instant Data Scraper is a uses AI to detect tabular or listing type data on web pages. Such data can be scraped into CSV or Excel file, no coding skills required. This extension can also click on the “Next” page links or buttons and retrieve data from multiple pages into one file. Pretty sweet.

Step 1: Download the Autopagerize Chrome plugin. It will allow appending the second, third, etc. search results pages to the bottom of the current page, creating one long page that contains all the results (or as many as you wish). As mentioned above it works in Google search results, GitHub, Amazon, Yelp and several others.

Step 2: Download the Instant Data Scraper plugin.

Step 3: Go to the URL you want to scrape, in this case, we’re grabbing the top JavaScript developers in SF on GitHub — again here is the page to start on sorted by followers: https://github.com/search?l=JavaScript&o=desc&p=7&q=stars%3A%3E1000+location%3A%22San+Francisco%22+location%3ACA+followers%3A%3E10+language%3AJavaScript&s=followers&type=Users

(Fun fact: The first guy’s last name is pronounced “Boss”-Stock. Pretty bad-ass huh?)

Step 4: Click the Autopagerize plugin in your browser then click Next for as many pages as you need. You’ll then have a list similar to the following:

That’s it! You should now have a list of all the data based on the above criteria.

Again this method can be used for other popular sites to quickly gather, extract and sort the data that you need.

If you’ve found this article helpful please share or clap so that others can find it. If you have any questions feel free to reply or reach out to me directly on Twitter @marcbegins.

Happy scraping!

--

--

Marc Howard

Helping accountants get to their first $1M and beyond.