Swiftorial Logo
Home
Swift Lessons
Tutorials
Learn More
Career
Resources

Python FAQ: Top Questions

30. How does Python's garbage collection work?

Python employs an automatic memory management system, commonly known as **garbage collection**, to reclaim memory that is no longer being used by objects. This frees developers from manually allocating and deallocating memory, reducing common memory-related bugs.

Python's primary garbage collection strategy relies on two main mechanisms:

1. Reference Counting (Primary Mechanism):

  • Concept: Every object in Python has a **reference count**, which is an integer that keeps track of how many references (variables, container objects, etc.) are pointing to that object.
  • Mechanism:
    • When an object is created, its reference count is 1.
    • When a new reference points to the object, its reference count increases.
    • When a reference to the object is removed (e.g., variable goes out of scope, variable is reassigned, `del` statement), its reference count decreases.
    • When an object's reference count drops to **zero**, it means there are no more references to that object anywhere in the program. At this point, the object is immediately deallocated, and the memory it occupied is reclaimed.
  • Pros:
    • **Immediate Reclamation:** Memory is typically reclaimed as soon as it's no longer needed, reducing memory footprint.
    • **Simplicity:** Conceptually easy to understand.
  • Cons:
    • **Cannot handle Reference Cycles:** This is the main limitation. If two or more objects refer to each other, forming a cycle, their reference counts will never drop to zero, even if they are no longer reachable from the rest of the program. This leads to a memory leak if not addressed.

2. Generational Cyclic Garbage Collector (Secondary Mechanism):

  • Concept: To address the problem of reference cycles, Python has a supplemental, generational garbage collector that runs periodically. This collector specifically identifies and reclaims objects involved in reference cycles that are no longer reachable from the root objects (e.g., global variables, active stack frames).
  • Mechanism:
    • **Generations:** Objects are grouped into "generations" based on their age. New objects start in the youngest generation (generation 0). If an object survives a garbage collection pass, it gets promoted to an older generation (generation 1, then generation 2).
    • **Collection Frequency:** The collector runs more frequently on younger generations because most objects are short-lived. Older generations are collected less often, saving CPU cycles.
    • **Cycle Detection:** The collector traverses objects within a generation (and potentially older ones) to find groups of objects that have non-zero reference counts but are unreachable from the active program. Once such cycles are identified, their memory is reclaimed.
  • Pros:
    • **Handles Reference Cycles:** Solves the primary limitation of reference counting.
    • **Efficiency (Generational):** By focusing on younger objects more often, it optimizes for common allocation patterns where many objects are created and discarded quickly.
  • Cons:
    • **Non-deterministic:** It runs periodically, so memory isn't always reclaimed immediately.
    • **Overhead:** Involves some CPU overhead during collection passes.

When does garbage collection happen?

  • **Reference Counting:** Immediately when a reference count drops to zero.
  • **Cyclic Collector:**
    • Periodically, based on thresholds of allocations and deallocations. You can configure these thresholds.
    • Manually, using `gc.collect()`.
    • When the program exits (though this is less about reclaiming memory and more about clean shutdown).

In summary, Python's memory management is a hybrid approach: fast, immediate reference counting handles most object deallocations, while a more sophisticated generational garbage collector handles the trickier cases of reference cycles.


import gc
import sys

# --- Example 1: Reference Counting ---
print("--- Reference Counting ---")

class MyObject:
    def __del__(self):
        print(f"MyObject instance {id(self)} is being deallocated.")

obj1 = MyObject()
obj_id = id(obj1)
print(f"Object created: {obj_id}")
print(f"Reference count for obj1: {sys.getrefcount(obj1) - 1}") # -1 because getrefcount itself adds a temporary reference

obj2 = obj1 # obj2 now refers to the same object
print(f"New reference (obj2) added. Ref count for {obj_id}: {sys.getrefcount(obj1) - 1}")

del obj2 # Remove one reference
print(f"Reference obj2 deleted. Ref count for {obj_id}: {sys.getrefcount(obj1) - 1}")

del obj1 # Remove the last reference, object should be deallocated
print(f"Reference obj1 deleted. Object {obj_id} is now gone.")

# If you try to access obj1 or obj2 now, it would be a NameError


# --- Example 2: Reference Cycles and Cyclic Garbage Collector ---
print("\n--- Reference Cycles ---")

class Node:
    def __init__(self, name):
        self.name = name
        self.next = None
        print(f"Node '{self.name}' created (id: {id(self)}).")

    def __del__(self):
        print(f"Node '{self.name}' (id: {id(self)}) is being deallocated.")

# Disable automatic garbage collection for demonstration (don't do this in production!)
gc.disable()
print("Automatic garbage collection disabled for demonstration.")

# Create a reference cycle
node_a = Node("A")
node_b = Node("B")

node_a.next = node_b # A refers to B
node_b.next = node_a # B refers to A (cycle created!)

a_id = id(node_a)
b_id = id(node_b)

print(f"Node A ref count: {sys.getrefcount(node_a) - 1}") # Should be 2 (node_a, node_b.next)
print(f"Node B ref count: {sys.getrefcount(node_b) - 1}") # Should be 2 (node_b, node_a.next)

del node_a # Remove the direct reference to A
del node_b # Remove the direct reference to B

print(f"Direct references node_a and node_b deleted.")
print(f"Node A ref count (if still accessible): {sys.getrefcount(object_id=a_id) if gc.get_objects() else 'N/A'}")
print(f"Node B ref count (if still accessible): {sys.getrefcount(object_id=b_id) if gc.get_objects() else 'N/A'}")

# Observe that __del__ is NOT called immediately for node_a and node_b
# because their ref counts are still 1 due to the cycle.
print("Nodes A and B are now unreachable but not deallocated (due to cycle and GC disabled).")

# Manually run the cyclic garbage collector
print("\nManually running cyclic garbage collector...")
gc.collect() # This will detect and collect the cycle
print("Cyclic garbage collector finished.")

# Now the __del__ methods should have been called.
print("Object A and B should now be deallocated.")

# Re-enable automatic garbage collection (good practice)
gc.enable()
print("Automatic garbage collection re-enabled.")

# --- Example 3: Inspecting GC thresholds ---
print("\n--- GC Thresholds ---")
thresholds = gc.get_threshold()
print(f"Current GC thresholds (gen0, gen1, gen2): {thresholds}")
print("These thresholds determine when the generational collector runs.")

# You can manually trigger a collection, but it's rarely necessary for typical apps
# gc.collect(0) # Collects only generation 0
# gc.collect(1) # Collects up to generation 1
# gc.collect(2) # Collects all generations
        

Explanation of the Example Code:

  • **Reference Counting Example:**
    • We define `MyObject` with a `__del__` method, which is called when an object is about to be deallocated.
    • When `obj1` is created, its ref count is 1.
    • Assigning `obj2 = obj1` increases the ref count to 2.
    • `del obj2` decreases it to 1.
    • `del obj1` decreases it to 0, at which point the `MyObject` instance is immediately deallocated, and its `__del__` method is called. This demonstrates immediate reclamation.
    • `sys.getrefcount()` is used to check the reference count. Note that it includes the temporary reference created by `getrefcount()` itself, so we subtract 1.
  • **Reference Cycles Example:**
    • We create two `Node` objects, `node_a` and `node_b`.
    • The crucial part is `node_a.next = node_b` and `node_b.next = node_a`, which creates a circular reference. Both `node_a` and `node_b` now have a reference count of 2 (one direct variable, one from the other node).
    • We explicitly `gc.disable()` to prevent the cyclic collector from running automatically and confusing the demonstration.
    • When `del node_a` and `del node_b` are executed, the direct variable references are removed. However, the objects `node_a` and `node_b` still have a reference count of 1 *due to the cycle*. Their `__del__` methods are **not called**. This shows that reference counting alone cannot reclaim them, leading to a memory leak.
    • Calling `gc.collect()` manually triggers the generational cyclic garbage collector. This collector identifies that even though their reference counts are not zero, `node_a` and `node_b` are no longer reachable from the active program. It then breaks the cycle and deallocates them, finally calling their `__del__` methods.
  • **GC Thresholds Example:**
    • `gc.get_threshold()` shows the current allocation thresholds for the three generations. These numbers determine when the cyclic collector is triggered. When the number of new allocations minus deallocations in a generation exceeds its threshold, a collection for that generation (and younger ones) occurs.

The examples highlight that Python's memory management is robust, handling both direct object deallocation via reference counting and the more complex problem of reference cycles through its generational garbage collector.