Bubble sort is a simple sorting algorithm that repeatedly steps through the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted.

Time complexity:

Average case : n2

Worst case : n2

Best case : n

[Read More]


What is the Fizz-Buzz test?

It’s a programming interview question designed to filter out candidates who can’t program.

Problem statement : Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Out of curiousity, I decided to solve this problem.

My first attempt :

for ix in range(1,101):
	if ix%3 == 0 and ix%5 == 0:
	elif ix%5 == 0:
	elif ix%3 == 0:

[Read More]

concurrent.futures in Python 3

The concurrent.futures module provides a common high level interface for asynchronously executing callables using pools of threads or processes.

The concurrent.futures.Executor is a class to execute function calls asynchronously. The important methods are submit(function, args), which calls the specified function passing in the given arguments, and map(function, iterables) which calls the specified function asynchronously passing in each iterable as an argument for a separate function call. This should not be used directly, but is used through its subclasses ThreadPoolExecutor and ProcessPoolExecutor.

Let’s jump into an example. The purpose of the following program is to find the sum of all prime numbers until the given number. There are two functions to demonstrate how to use a pool of threads and how to use a pool of processes. sum_primes_thread(nums) uses threads and sum_primes_process(nums) uses processes. Notice that the only difference between the two functions is that one uses ThreadPoolExecutor while the other uses ProcessPoolExecutor.

[Read More]

Multiprocessing and multithreading in Python 3

To begin with, let us clear up some terminlogy:

  • Concurrency is when two or more tasks can start, run, and complete in overlapping time periods. It doesn’t necessarily mean they’ll ever both be running at the same instant. Eg. multitasking on a single-core machine.

  • Parallelism is when two or more tasks are executed simultaneously.

  • A thread is a sequence of instructions within a process. It can be thought of as a lightweight process. Threads share the same memory space.

  • A process is an instance of a program running in a computer which can contain one or more threads. A process has its independant memory space.

The threading module is used for working with threads in Python.

The CPython implementation has a Global Interpreter Lock (GIL) which allows only one thread to be active in the interpreter at once. This means that threads cannot be used for parallel execution of Python code. While parallel CPU computation is not possible, parallel IO operations are possible using threads. This is because performing IO operations releases the GIL. To learn more about the GIL refer here.

[Read More]

Analyzing programming language statistics of 100,000 Github repositories

The first step is to gather data about 100,000 repositories using the Github api. I used scrapy for this.

A high level overview of how I did this:

  1. Start from the id of my scrape_github repo https://api.github.com/repositories?since=76761293&access_token=MY_TOKEN

  2. Save only the id, name and languages_url for each repo. The languages_url is the api endpoint which contains the programming language statistics of the current repo.

  3. Extract the link to the next page from the Link header and follow it repeating the above steps.

Each api call returns a list of 100 repositories, so to retrieve data about 100,000 repositories, 1000 api calls are required.

All the output is saved to a file called all_repos.jsonl which came to around 13MB.

The next step is to follow the languages_url api endpoint for each repository and save the data.

[Read More]