Skip to content

yi-json/dsa-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary of Common Data Structures and Algorithms

Strings

We denote a string as s

Common String Operations

  • s.split(delimiter) - O(n)
  • s.replace(str1, str2, k) - O(n*k) or O(n) if k is omitted
    • replaces the first k instances of str2 as str1
    • Note: Since Python strings are immutable, we can delete substrings by doing s.replace(str1, "", n)
  • s[i].isdigit() - Checks if the string at index i is an integer

Making a string frequency array using ord()

s = "jaydenlimisthebestprogrammer"
freq = [0] * 26
for ch in s:
    idx = ord(ch) - ord('a')
    freq[idx] += 1

Converting str to int

If you're iterating a string and you want to extract an integer, but the integer could possibly span more than 1 digit, you can do so by:

curr_num = 0
s = "123"

for ch in s:
    if ch.isdigit():
        curr_num = curr_num * 10 + int(ch)

KMP Algorithm

  1. Longest Prefix Suffix (lps)
    • lps[i] = longest prefix that's also a suffix, but not including the entire string from s[:i]

Lists/Arrays

Let l be a list and k be an integer

  • Number of subsequences: 2^n
  • Number of sub-arrays: n^2

Slicing

  • l[:k] = start at index k and go to the end
  • l[-k:] = start from the k-th to last element and go to the end

Iterating a list backwards

n = len(nums)
for i in range(n - 1, -1, -1):
    print(nums[i])

Lambda Functions

Using a lambda function on Counter object, sorting based on frequency in descending order

    cnt = Counter(nums)
    sorted_cnt = sorted(cnt.items(), key=lambda x : x[1], reverse=True) 

Queues/Deques

# Create a deque
dq = deque([1, 2, 3])  # Initialize with a list
dq = deque()  # Create an empty deque

dq.append(4)  # deque becomes [1, 2, 3, 4]
dq.appendleft(0)  # deque becomes [0, 1, 2, 3, 4]

# Remove from the right (end)
dq.pop()  # Removes 4; deque becomes [0, 1, 2, 3]

# Remove from the left (front)
dq.popleft()  # Removes 0; deque becomes [1, 2, 3]

front = dq[0]  # Access the first element (1)
end = dq[-1]   # Access the last element (3)

Stacks

How does one know if you're encountering a stack problem?

  1. Dependency on most recent elements: if a problem requires reasoning about the most recent item that satisfies some condition
    • Ex: Next Greater Element, Valid Parentheses (need next unmatched open bracket), Daily Temperatures (next warmer day)
  2. Look for nested or paired structure ({, [, ()
  3. Look for monotonic relationships: maintaining a running maximum, minimum, or any order-based property
    • Maintain a monotonic stack - one that is always increasing or decreasing
stack = []
stack.append(element)

# Removes and returns top element
top = stack.pop()

# Accesses the top element without popping it
top = stack[-1]

# checking if stack is empty
if not stack:
	...

An algorithm done using stacks can also be done via recursion. Use the following pattern:

  • When you push something -> recursive call
  • When you pop on something -> recursive return

Binary Search

bisect_right(sorted_list, x)

Gives you the index where x would be inserted, to the right of any equal elements

Suppose we work with tuples:

  • bisect_right(history, (0, 0)) -> Gives the first (0, ...) value
  • bisect_right(history, (0, float('inf'))) -> Gives the last (0, ...) value

Trees

Implementation

class TreeNode:
    def __init__(self, val=val, left=left, right=right):
        self.val = val
        self.left = left
        self.right = right

Common Terminology

  • rank - position in an in-order traversal
  • diameter - length of the longest path between any two nodes in a tree. may or may not pass through the root.
  • depth - distance from the node to the root node
    • Generally, depth(node) = depth(parent) + 1

Lowest Common Ancestor

Defined between two nodes, p and q, as the lowest node in T that has both p and q as descendants

  • Note: A node can be a descendant of itself

If T is a binary tree

def lca(root: TreeNode, p: TreeNode, q: TreeNode) -> TreeNode:
    if not root:
        return None

    if root == p or root == q:
        return root

    left = lca(root.left, p, q)
    right = lca(root.right, p, q)

    if left and right:
        return root
    return left or right

Binary Tree Implementation

class TreeNode:
    def __init__(self, val):
        self.val = val
        self.left = self.right = None

N-ary Tree Implementation

class Node:
    def __init__(self, val: Optional[int] = None, children: Optional[List['Node']] = None):
        if children is None:
            children = []
        self.val = val
        self.children = children

BFS Template - O(n)

from typing import Optional, List
from collections import deque

class Solution:
    def levelOrder(self, root: Optional[TreeNode]) -> List[List[int]]:
        if not root:
            return []
        q = deque([root])
        ans = []
        while q:
            level = []
            for i in range(len(q)):
                current = q.popleft()
                level.append(current.val)
                if current.left:
                    q.append(current.left)
                
                if current.right:
                    q.append(current.right)
    
            ans.append(level)
        return ans

Morris In-Order Traversal

An in-order traversal that's memory efficient - from O(h) to O(1)

Union Find

Union Find is essentially a Forest of Trees and we perform either Union by Rank (height) or Union by Size (# of nodes)

  • For two given nodes, u and v, when we perform the union of these 2 components, the component whose parent has the larger size becomes the parent of the other node.

Application of Union Find:

  • Count number of connected components
  • Cycle detection
    • If calling union(u, v)returns False, then u and v were already connected prior -> cycle detection

Time Complexity = O(α(n)) ~= O(1) due to path compression and by doing union by rank/size * This is also known as inverse Ackermann time Space Complexity = O(n)

Union by Size Implementation

class UnionFind:
    
    def __init__(self, n: int):
        self.parent = [i for i in range(n)]
        self.size = [1] * n
        self.num_components = n

    def find(self, x: int) -> int:
        # find the root of x
        if x != self.parent[x]:
            # path compression
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
        
    def isSameComponent(self, x: int, y: int) -> bool:
        return self.find(x) == self.find(y)

    def union(self, x: int, y: int) -> bool:
        # connect x and y
        root_x, root_y = self.find(x), self.find(y)
        if root_x != root_y:
            if self.size[root_x] < self.size[root_y]:
                self.parent[root_x] = root_y
                self.size[root_y] += self.size[root_x]
            else:
                self.parent[root_y] = root_x
                self.size[root_x] += self.size[root_y]
            self.num_components -= 1
            return True
        # not same component
        return False
        

    def getNumComponents(self) -> int:
        return self.num_components

Graphs

Topological Sort

Given a DAG, G = (V, E) toposort(G) returns a linear ordering of all vertices such that for every (u, v) in E, vertex u appears before v in the ordering

Once you reach v, all of its predecessors have already been processed

Time Complexity = Space Complexity = O(n + m)

How the DFS works

  • Given some src node, we want to visit every descending node of that src BEFORE visiting the src node
    • Why? Take this ex:
      a -> b -> c
        -> d
      
    • We want to visit both (b, c) and (d) before a

Why reverse() at the end? * We append node after exploring all its neighbors, which is post-order * Ex: edges = [[0, 1], [1, 2]] * The graph is 0 → 1 → 2 * DFS appends in the order [2, 1, 0] * You need to reverse() it to get [0, 1, 2]

Heaps/Priority Queues

Common Operations + Time Complexities

  • insert - O(log n)
  • get min/max - O(1)
  • extract min/max - O(log n)
  • update - O(log n) if we have the index, else O(n)
  • build - O(n)

Common Operations of heapq object

  • heapify(iterable) - O(n)
  • heappush(iterable, x) - O(log n)
  • heappop(iterable) - O(log n)
  • heapushpop(iterable, x) - O(log n)
  • nsmallest(k, iterable, key) - O(n log k)
  • nlargest(k, iterable, key) - O(n log k)
    • k: The number of largest elements to return.
    • iterable: The input data we are selecting from.
    • key: A function that determines the value to compare when finding the largest elements. Example usage: Top K Largest Elements
class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return nums
        cnt = Counter(nums)
        return heapq.nlargest(k, cnt.keys(), key=cnt.get)
  • key=cnt.get - The cnt.get function retrieves the frequency count of each key. heapq.nlargest() uses this frequency count to determine the "largest" elements.

Binary Heap

Typically represented as an array

  1. Complete binary tree
    • Every level is completely filled, with the exception of the last row, which is filled from left to right
  2. Heap property
    • Min Heap: Each node is less than or equal to its children
    • Max Heap: Each node is greater than or equal to its children

Useful Info

For a given node at index i, we can find...

  • left child: 2*i+1
  • right child: 2*i+2
  • parent: (i - 1) // 2

Dijkstra's Algorithm

Given a weighted, directed graph, and a starting vertex, return the shortest distance from the starting vertex to every vertex in the graph.

Time complexity: O(E log V), where V = # of nodes, E = # of edges Space Complexity: O(E + V)

Bellman Ford Algorithm

Given a weighted, directed graph (which may contain negative weights), and a starting vertex, return the shortest distance from the starting vertex to every vertex in the graph.

Time complexity: O(EV), where V = # of nodes, E = # of edges Space Complexity: O(E + V)

Inputs:

  • n - the number of vertices in the graph, where (2 <= n <= 100). Each vertex is labeled from 0 to n - 1
  • edges - a list of tuples, each representing a directed edge in the form (u, v, w), where u is the source vertex, v is the destination vertex, and w is the weight of the edge, where (1 <= w <= 10)
  • src - the source vertex from which to start the algorithm, where (0 <= src < n)
    def dijkstra(n: int, edges: List[List[int]], src: int) -> List[int]:
        G = {i: [] for i in range(n)}
        for u, v, w in edges:
            G[u].append((v, w))

        heap = [(0, src)]
        dist = [float('inf')] * n
        dist[src] = 0
        while heap:
            d, node = heapq.heappop(heap)
            if d > dist[node]:
                continue
            
            for nei, w in G[node]:
                new_dist = d + w
                if new_dist < dist[nei]:
                    dist[nei] = new_dist
                    heapq.heappush(heap, (new_dist, nei))
        shortest = {}
        for i in range(n):
            shortest[i] = dist[i]
        return shortest

Quickselect Algorithm

Typically used to solve problems that are "find k something" such as

  • kth smallest, kth largest, kth most frequent, kth less frequent

Time Complexity

  • Average case: O(n)
  • Worst case: O(n^2)
  1. Select a pivot and form partitions
  2. Run the partitioning algorithm only on the side that contains our "top" k-th element

Bit Manipulation

XOR Operator - ^

  • Any number XORed with itself cancels out
  • Order of operations does not matter, meaning:

Math

  • Prime numbers
    • A positive integer > 1 that has exactly two distinct divisors: 1 and itself
    • 1 is not a prime number
    • Any multiple of a prime number is not itself prime
    • gcd(a, b) == 1
      • code to find gcd(a, b) manually found in /math/gcd.py
  • Least Common Multiple: Smallest positive integer that is a multiple of both a and b
    • Multiples of 4 → 4, 8, 12, 16, 20, 24, ...
    • Multiples of 6 → 6, 12, 18, 24, 30, ...
    • The smallest common multiple is 12 → so LCM(4, 6) = 12.
    • Formula: lcm(a, b) = a*b / gcd(a, b)

Miscellaneous

Collections Data Structures

  • from collections import Counter
  • from collections import defaultdict

About

A hub of algorithm/data structure templates/notes in Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages