Memory management in python refers to the allocation and deallocation of objects. It's handled by Python's Garbage Collection Algorithm. We'll try to understand the basics of how memory management works in Python. Memory management in Python is done automatically by the Interpreter and the developer does not need to take any steps. Though Python still provides a module called gc
for explicitly calling Garbage Collector in some cases to tell it explicitly collect objects rather than waiting for automatic behavior. Memory management also helps write better and efficient code to developers.
Python variables are different from other languages like C/C++/Java in that they are just references to Object. They do not have any data type attached to them. Hence you can assign any type of object to Python variable. Objects can have more than one reference pointing to it. Reference count generally refers to a number of references pointing to object. One can use id()
function provided by Python to find out the memory location of the python variable.
Every Python ojbects maintains 3 things internally.
Type
- Type of objectValue
- Value of ObjectRefCount
- Reference CountPython provides simple objects like a number, string and container objects like list,dict, user-defined classes, etc.
Let’s define a few variables and find out their memory location using id()
function.
import sys
print(sys.version)
x = 10
y = x
t = 10
print(id(x), id(y), id(t))
Notice above that integer object with value 10 is created and variables x,y,t
are all pointing to it. Hence it's reference count is currently 3. Also notice that id()
function returns same value for all 3 references.
z = 'hello'
l = 'hello'
k = l
i = 'Hello World'
j = 'Hello World'
print(id(z), id(l), id(k), id(i), id(j))
Notice above that first string is creating the same id which means that both are pointing to the same objects. Wheres in case of string with space in it is getting created as two separate objects. One can see it from id()
output above.
Let’s create a few container objects below.
lst = ['hello',10, 'Hello World']
print(id(lst[0]), id(lst[1]), id(lst[2]))
Note above that id of first and 2nd object of the list is the same as ids mentioned above for those objects. But for 3rd object, it created a new copy of that object. Now integer object with value 10 will have 4 references including 3 defined above and one mentioned in the list. The same will be for string hello
which will also have 4 reference count.
print(dir())
There are 3 ways to decrease the reference count of an object:
del
statementNone
to a variable name.del x
x
y, t, id(y), id(t)
Note above that even though we deleted reference x from memory. Integer object with value 10 is still accessible through y
and t
both. When we above pointed y
to x
, it pointed to integer object with value 10. Also, note that id
is still same.
When you call del
on objects created from user-defined classes then it'll call their __del__()
method if it's implemented in class.
Let’s try to use None
now.
y = None
y
t, lst[1], id(t), id(lst[1])
Please make a note that calling del
statement on reference name or pointing reference to None
does not actually delete the object. It just removes that reference variable name as a reference to that object which means that object will no longer be accessible through that reference.
Calling del
statement or setting variable to None
only decreases the reference count of that object by 1. The pointing reference variable to some other object has the same effect on reference count.
Note: Don't confuse del
statement with __del__
. del
statement does not call __del__
method declared in some classes. It just decreases ref count by 1 and that's all. __del__
method also called destructor
sometimes gets executed when the reference count of that object reaches zero before removing it from memory.
Object's reference count can also decrease when it goes out of scope. See below example where the variable is created inside a function and when function call completes, it's reference count goes to 0 and it becomes inaccessible which can be seen as NameError
in the next cell.
When reference count of object goes to 0, it becomes accessible by Garbage Collector.
def test():
title = 'Hello'
print(title)
test()
title
Objects defined in local scope like inside of functions, loops, classes, etc goes out of scope as soon as the function, the loop ends. But objects defined in global space exists until the program ends and does not become eligible to be collected by Garbage Collector because their reference count does not get 0. Hence think wisely before declaring complex objects in global space else you'll end up using memory even though you might not be using that global object always.
Garbage collection is an algorithm that automatically releases/frees memory from objects which are no longer in use.
Python uses 2 types of Garbage Collection algorithms:
Reference counting refers to counting references to an object and deleting that object once it's reference count reaches 0. There are a few pros and cons to this method. Pros:
Cons:
Please find below the example for an explanation of cyclical references.
class Node(object):
def __init__(self, value):
self.value = value
def next(self, next):
self.next = next
root = Node('Root')
left = Node('Left')
right = Node('Right')
root.next(left)
left.next(right)
right.next(left)
In the above code, if you notice the root node has 1 reference count(variable named root), the left node has 3 reference count(variable named left, root and right) and the right node has 2 reference count (variable named right and left).
Here we have cycled as left is referring to right and right is referring to left.
Now if we try to remove references which are pointed by variable names like root, left and right then their reference count will go down by one which will result in collecting root node. But left and right won't be collected because they still have references to them internally. This way these objects become inaccessible and won't be collected by reference counting garbage collectors.
del root, left,right
Referencing counting alone is not able to collect objects with cyclical references.
Tracing garbage collection uses an algorithm called Mark and Sweep behind the scene.
It has 2 steps:
Mark
- We take an object which is reachable and mark all objects which are reachable from that object.Sweep
- When marking is done then the sweep step will remove all other objects which are not reachable.Generational Garbage collection is a type of Tracing Garbage collection.
To provide a generational garbage collection facility, Python generally keeps a list of every object created as the program is run. Also make a note that only Container objects
whose reference count is greater than 0 will be stored in this list because they keep a reference to other objects not simple objects like int, float, bool, etc. Container objects are responsible for creating cyclical reference issues.
It keeps 3 such lists and all program objects are kept in exactly one of generation list:
generation 0
- Newly created objects are kept in this list.generation 1
- Objects moved to this generation when it gets a little old and still getting referenced.generation 2
- Objects which are oldest will be in this generation. It has objects which will be used as the program executes.When a number of objects in a generation reach some threshold set then it runs garbage collection (Tracing - Mark and Sweep) on that generation and any generation younger than it. So if generation 2 reaches threshold then generation 1 and generation 0 will be swept as well. It makes a list of objects to discard and also detect cycles. If an object does not have outside references as we discussed above in Mark and Sweep, then they'll be put onto the discard list to be collected by the garbage collector. Then the garbage collection cycle will run and empty discard list.
Once the garbage collection cycle has completed discarding objects then whichever objects survived will be moved to one generation higher than the current one because they still have reference and will be used by the program again in the future. One can think that generation 0 will be more prone to reach it's threshold because most of the objects die young (have a little scope like loops, function calls, etc.) which won't be promoted to higher generations more.
When the reference count reaches 0 then you get immediate garbage collection, but when you have cyclical references then we need to wait for generational garbage collection to reach its threshold for generation in which object is residing.
If you want to