Zunayed Ali Morsalin Home

Using two heaps to keep running list of medians

Tags: heaps

Let’s say you wanted to compute the running median of a steam of numbers. A brute force solution would be to store all the elements and regularly sort it. This would have a time comlexity of O(n^2)

We can achieve a O(logn) runtime by maintaining two heaps, a max heap and a min heap. One of the heaps will have an extra value. If we have seen odd numbers we just take that extra value that is sorted in the max heap. If it is even we take the max from the max heap and the min from the min heap and calculate the medium

max heap

Putting it together in code

import heapq

def running_medium(streaming_list):
    max_heap = [] # store smallest -> midpoint
    min_heap = [] # midpoint -> largest

    result = []

    for num in streaming_list:
        largest_from_min_heap = heapq.heappop(min_heap)

        if num < largest_from_min_heap:
            heapq.heappush(max_heap, num)
            heapq.heappush(min_heap, largest_from_min_heap)
            heapq.heappush(min_heap, num)
            # max heap using negative because python doesn't support it natively 
            heapq.heappush(max_heap, -largest_from_min_heap)

        if len(max_heap) > len(min_heap):
            heapq.heappush(min_heap, -heapq.heappop(max_heap))

        if len(min_heap) == len(max_heap):
            result.append(min_heap[0] + (-max_heap[0]) * .5)

    return result