Summary of Common Data Structures and Algorithms

Strings

We denote a string as s

Common String Operations

s.split(delimiter) - O(n)
s.replace(str1, str2, k) - O(n*k) or O(n) if k is omitted
- replaces the first k instances of str2 as str1
- Note: Since Python strings are immutable, we can delete substrings by doing s.replace(str1, "", n)
s[i].isdigit() - Checks if the string at index i is an integer

Making a string frequency array using `ord()`

s = "jaydenlimisthebestprogrammer"
freq = [0] * 26
for ch in s:
    idx = ord(ch) - ord('a')
    freq[idx] += 1

Converting str to int

If you're iterating a string and you want to extract an integer, but the integer could possibly span more than 1 digit, you can do so by:

curr_num = 0
s = "123"

for ch in s:
    if ch.isdigit():
        curr_num = curr_num * 10 + int(ch)

KMP Algorithm

Longest Prefix Suffix (lps)
- lps[i] = longest prefix that's also a suffix, but not including the entire string from s[:i]

Lists/Arrays

Let l be a list and k be an integer

Number of subsequences: 2^n
Number of sub-arrays: n^2

Slicing

l[:k] = start at index k and go to the end
l[-k:] = start from the k-th to last element and go to the end

Iterating a list backwards

n = len(nums)
for i in range(n - 1, -1, -1):
    print(nums[i])

Lambda Functions

Using a lambda function on `Counter` object, sorting based on frequency in descending order

    cnt = Counter(nums)
    sorted_cnt = sorted(cnt.items(), key=lambda x : x[1], reverse=True)

Queues/Deques

# Create a deque
dq = deque([1, 2, 3])  # Initialize with a list
dq = deque()  # Create an empty deque

dq.append(4)  # deque becomes [1, 2, 3, 4]
dq.appendleft(0)  # deque becomes [0, 1, 2, 3, 4]

# Remove from the right (end)
dq.pop()  # Removes 4; deque becomes [0, 1, 2, 3]

# Remove from the left (front)
dq.popleft()  # Removes 0; deque becomes [1, 2, 3]

front = dq[0]  # Access the first element (1)
end = dq[-1]   # Access the last element (3)

Stacks

How does one know if you're encountering a stack problem?

Dependency on most recent elements: if a problem requires reasoning about the most recent item that satisfies some condition
- Ex: Next Greater Element, Valid Parentheses (need next unmatched open bracket), Daily Temperatures (next warmer day)
Look for nested or paired structure ({, [, ()
Look for monotonic relationships: maintaining a running maximum, minimum, or any order-based property
- Maintain a monotonic stack - one that is always increasing or decreasing

stack = []
stack.append(element)

# Removes and returns top element
top = stack.pop()

# Accesses the top element without popping it
top = stack[-1]

# checking if stack is empty
if not stack:
	...

An algorithm done using stacks can also be done via recursion. Use the following pattern:

When you push something -> recursive call
When you pop on something -> recursive return

Binary Search

`bisect_right(sorted_list, x)`

Gives you the index where x would be inserted, to the right of any equal elements

Suppose we work with tuples:

bisect_right(history, (0, 0)) -> Gives the first (0, ...) value
bisect_right(history, (0, float('inf'))) -> Gives the last (0, ...) value

Trees

Implementation

class TreeNode:
    def __init__(self, val=val, left=left, right=right):
        self.val = val
        self.left = left
        self.right = right

Common Terminology

rank - position in an in-order traversal
diameter - length of the longest path between any two nodes in a tree. may or may not pass through the root.
depth - distance from the node to the root node
- Generally, depth(node) = depth(parent) + 1

Lowest Common Ancestor

Defined between two nodes, p and q, as the lowest node in T that has both p and q as descendants

Note: A node can be a descendant of itself

If T is a binary tree

def lca(root: TreeNode, p: TreeNode, q: TreeNode) -> TreeNode:
    if not root:
        return None

    if root == p or root == q:
        return root

    left = lca(root.left, p, q)
    right = lca(root.right, p, q)

    if left and right:
        return root
    return left or right

Binary Tree Implementation

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = self.right = None

N-ary Tree Implementation

class Node:
    def __init__(self, val: Optional[int] = None, children: Optional[List['Node']] = None):
        if children is None:
            children = []
        self.val = val
        self.children = children

BFS Template - O(n)

from typing import Optional, List
from collections import deque

class Solution:
    def levelOrder(self, root: Optional[TreeNode]) -> List[List[int]]:
        if not root:
            return []
        q = deque([root])
        ans = []
        while q:
            level = []
            for i in range(len(q)):
                current = q.popleft()
                level.append(current.val)
                if current.left:
                    q.append(current.left)
                
                if current.right:
                    q.append(current.right)
    
            ans.append(level)
        return ans

Morris In-Order Traversal

An in-order traversal that's memory efficient - from O(h) to O(1)

Union Find

Union Find is essentially a Forest of Trees and we perform either Union by Rank (height) or Union by Size (# of nodes)

For two given nodes, u and v, when we perform the union of these 2 components, the component whose parent has the larger size becomes the parent of the other node.

Application of Union Find:

Count number of connected components
Cycle detection
- If calling union(u, v)returns False, then u and v were already connected prior -> cycle detection

Time Complexity = O(α(n)) ~= O(1) due to path compression and by doing union by rank/size * This is also known as inverse Ackermann time Space Complexity = O(n)

Union by Size Implementation

class UnionFind:
    
    def __init__(self, n: int):
        self.parent = [i for i in range(n)]
        self.size = [1] * n
        self.num_components = n

    def find(self, x: int) -> int:
        # find the root of x
        if x != self.parent[x]:
            # path compression
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
        
    def isSameComponent(self, x: int, y: int) -> bool:
        return self.find(x) == self.find(y)

    def union(self, x: int, y: int) -> bool:
        # connect x and y
        root_x, root_y = self.find(x), self.find(y)
        if root_x != root_y:
            if self.size[root_x] < self.size[root_y]:
                self.parent[root_x] = root_y
                self.size[root_y] += self.size[root_x]
            else:
                self.parent[root_y] = root_x
                self.size[root_x] += self.size[root_y]
            self.num_components -= 1
            return True
        # not same component
        return False
        

    def getNumComponents(self) -> int:
        return self.num_components

Graphs

Topological Sort

Given a DAG, G = (V, E) toposort(G) returns a linear ordering of all vertices such that for every (u, v) in E, vertex u appears before v in the ordering

Once you reach v, all of its predecessors have already been processed

Time Complexity = Space Complexity = O(n + m)

How the DFS works

Given some src node, we want to visit every descending node of that src BEFORE visiting the src node
- Why? Take this ex:
```
a -> b -> c
  -> d
```
- We want to visit both (b, c) and (d) before a

Why reverse() at the end? * We append node after exploring all its neighbors, which is post-order * Ex: edges = [[0, 1], [1, 2]] * The graph is 0 → 1 → 2 * DFS appends in the order [2, 1, 0] * You need to reverse() it to get [0, 1, 2]

Heaps/Priority Queues

Common Operations + Time Complexities

insert - O(log n)
get min/max - O(1)
extract min/max - O(log n)
update - O(log n) if we have the index, else O(n)
build - O(n)

Common Operations of `heapq` object

heapify(iterable) - O(n)
heappush(iterable, x) - O(log n)
heappop(iterable) - O(log n)
heapushpop(iterable, x) - O(log n)
nsmallest(k, iterable, key) - O(n log k)
nlargest(k, iterable, key) - O(n log k)
- k: The number of largest elements to return.
- iterable: The input data we are selecting from.
- key: A function that determines the value to compare when finding the largest elements. Example usage: Top K Largest Elements

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        cnt = Counter(nums)
        return heapq.nlargest(k, cnt.keys(), key=cnt.get)

key=cnt.get - The cnt.get function retrieves the frequency count of each key. heapq.nlargest() uses this frequency count to determine the "largest" elements.

Binary Heap

Typically represented as an array

Complete binary tree
- Every level is completely filled, with the exception of the last row, which is filled from left to right
Heap property
- Min Heap: Each node is less than or equal to its children
- Max Heap: Each node is greater than or equal to its children

Useful Info

For a given node at index i, we can find...

left child: 2*i+1
right child: 2*i+2
parent: (i - 1) // 2

Dijkstra's Algorithm

Given a weighted, directed graph, and a starting vertex, return the shortest distance from the starting vertex to every vertex in the graph.

Time complexity: O(E log V), where V = # of nodes, E = # of edges Space Complexity: O(E + V)

Bellman Ford Algorithm

Given a weighted, directed graph (which may contain negative weights), and a starting vertex, return the shortest distance from the starting vertex to every vertex in the graph.

Time complexity: O(EV), where V = # of nodes, E = # of edges Space Complexity: O(E + V)

Inputs:

n - the number of vertices in the graph, where (2 <= n <= 100). Each vertex is labeled from 0 to n - 1
edges - a list of tuples, each representing a directed edge in the form (u, v, w), where u is the source vertex, v is the destination vertex, and w is the weight of the edge, where (1 <= w <= 10)
src - the source vertex from which to start the algorithm, where (0 <= src < n)

    def dijkstra(n: int, edges: List[List[int]], src: int) -> List[int]:
        G = {i: [] for i in range(n)}
        for u, v, w in edges:
            G[u].append((v, w))

        heap = [(0, src)]
        dist = [float('inf')] * n
        dist[src] = 0
        while heap:
            d, node = heapq.heappop(heap)
            if d > dist[node]:
                continue
            
            for nei, w in G[node]:
                new_dist = d + w
                if new_dist < dist[nei]:
                    dist[nei] = new_dist
                    heapq.heappush(heap, (new_dist, nei))
        shortest = {}
        for i in range(n):
            shortest[i] = dist[i]
        return shortest

Quickselect Algorithm

Typically used to solve problems that are "find k something" such as

kth smallest, kth largest, kth most frequent, kth less frequent

Time Complexity

Average case: O(n)
Worst case: O(n^2)

Select a pivot and form partitions
Run the partitioning algorithm only on the side that contains our "top" k-th element

Bit Manipulation

XOR Operator - `^`

Any number XORed with itself cancels out
- Useful for this problem: Single Number
Order of operations does not matter, meaning:

Math

Prime numbers
- A positive integer > 1 that has exactly two distinct divisors: 1 and itself
- 1 is not a prime number
- Any multiple of a prime number is not itself prime
- gcd(a, b) == 1
  - code to find gcd(a, b) manually found in /math/gcd.py
Least Common Multiple: Smallest positive integer that is a multiple of both a and b
- Multiples of 4 → 4, 8, 12, 16, 20, 24, ...
- Multiples of 6 → 6, 12, 18, 24, 30, ...
- The smallest common multiple is 12 → so LCM(4, 6) = 12.
- Formula: lcm(a, b) = a*b / gcd(a, b)

Miscellaneous

Collections Data Structures

from collections import Counter
from collections import defaultdict

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
counter		counter
divide_&_conquer		divide_&_conquer
dp		dp
graphs		graphs
heaps		heaps
lists		lists
math		math
shortest_path		shortest_path
sorting		sorting
strings		strings
trees		trees
trie		trie
two_ptrs		two_ptrs
.gitignore		.gitignore
README.md		README.md

yi-json/dsa-py

Folders and files

Latest commit

History

Repository files navigation