Eng. Elias Owis · Follow
12 min read · Aug 1, 2023
--
Python, with its user-friendly syntax and extensive libraries, has emerged as a versatile and widely-used programming language across various domains. However, its interpretive nature often leads to performance bottlenecks, especially when dealing with computationally intensive tasks. Traditionally, developers have turned to languages like C++, C#, Rust and JavaScript for improved execution speed. In this article, we explore Numba, a game-changing library that enables Python to compete with these lower-level languages by harnessing the power of just-in-time (JIT) compilation. We will delve into Numba’s features, provide a comprehensive comparison of Python with Numba against other languages, explore additional examples showcasing Numba’s capabilities, and discuss when and where to effectively leverage Numba’s capabilities.
Numba, an open-source project backed by Anaconda, has revolutionized Python’s performance landscape by providing a JIT compiler that translates Python code into optimized machine code. Unlike traditional Python interpreters, Numba compiles Python functions on-the-fly, yielding remarkable speed-ups by leveraging the Low-Level Virtual Machine (LLVM) infrastructure. The result is highly efficient native machine code that rivals the performance of compiled languages like C++.
Let’s create a code example for a complicated algorithm that performs a brute-force search to count all prime numbers within a given range. Brute-force searching for prime numbers can be computationally expensive, especially for larger ranges. We’ll implement this algorithm in C++, C#, JavaScript, Rust, Python without Numba, and Python with Numba. We will compare the performance and execution time of these implementations.
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Brute-force searching involves checking each number within the given range to determine if it is prime. We’ll use a simple function to check for prime numbers.
To evaluate the performance of the implementations, we’ll run the algorithms with a large list of numbers and measure the execution time.
(I will set the execution time on my laptop).
- C++ Implementation:
#include <iostream>
#include <vector>
#include <ctime>bool is_prime(int num)
{
if (num <= 1)
return false;
for (int i = 2; i * i <= num; ++i)
{
if (num % i == 0)
return false;
}
return true;
}
int find_primes(int start, int end)
{
int count = 0;
for (int num = start; num <= end; ++num)
{
if (is_prime(num))
{
count++;
}
}
return count;
}
int main()
{
int start = 0;
int end = 10000000;
std::vector<int> primes;
// Find primes and measure execution time
clock_t start_time = clock();
int primes_count = find_primes(start, end);
clock_t end_time = clock();
double execution_time = static_cast<double>(end_time - start_time) / CLOCKS_PER_SEC;
std::cout << "Execution time: " << execution_time << " seconds" << std::endl;
std::cout << "Total prime numbers found: " << primes.size() << std::endl;
return 0;
}
C++ Execution Time: 8.9 seconds
- C# Implementation:
using System;
using System.Collections.Generic;
using System.Diagnostics;public class Program
{
public static bool IsPrime(int num)
{
if (num <= 1) return false;
for (int i = 2; i * i <= num; ++i)
{
if (num % i == 0) return false;
}
return true;
}
public static int FindPrimes(int start, int end)
{
int count = 0;
for (int num = start; num <= end; ++num)
{
if (IsPrime(num))
{
count++;
}
}
return count;
}
public static void Main()
{
int start = 0;
int end = 10000000;
// Find primes and measure execution time
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int primes_count = FindPrimes(start, end);
stopwatch.Stop();
double executionTime = stopwatch.Elapsed.TotalSeconds;
Console.WriteLine("Execution time: " + executionTime + " seconds");
Console.WriteLine("Total prime numbers found: " + primes_count);
}
}
C# Execution Time: 9.0 seconds
- Rust Implementation:
use std::time::Instant;fn is_prime(num: i32) -> bool {
if num <= 1 {
return false;
}
for i in 2..=((num as f64).sqrt() as i32) {
if num % i == 0 {
return false;
}
}
true
}
fn find_primes(start: i32, end: i32) -> i32 {
let mut count = 0;
for num in start..=end {
if is_prime(num) {
count += 1;
}
}
count
}
fn main() {
let start = 0;
let end = 10000000;
// Find primes and measure execution time
let start_time = Instant::now();
let primes_count = find_primes(start, end);
let end_time = Instant::now();
let execution_time = end_time.duration_since(start_time).as_secs_f64();
println!("Execution time: {} seconds", execution_time);
println!("Total prime numbers found: {}", primes_count);
}
Rust Execution Time: 16.2 seconds
- JavaScript Implementation:
function isPrime(num) {
if (num <= 1) return false;
for (let i = 2; i * i <= num; ++i) {
if (num % i === 0) return false;
}
return true;
}function findPrimes(start, end) {
let count = 0;
for (let num = start; num <= end; ++num) {
if (isPrime(num)) {
count++;
}
}
return count;
}
function main() {
const start = 0;
const end = 10000000;
// Find primes and measure execution time
const startTime = new Date();
const primes_count = findPrimes(start, end);
const endTime = new Date();
const executionTime = (endTime - startTime) / 1000;
console.log("Execution time:", executionTime, "seconds");
console.log("Total prime numbers found:", primes_count);
}
main();
JS Execution Time: 8.9 seconds
- Python Implementation (Without Numba):
import timedef is_prime(num):
if num <= 1:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
def find_primes(start, end):
count = 0
for num in range(start, end + 1):
if is_prime(num):
count += 1
return count
def main():
start = 0
end = 10000000
# Find primes and measure execution time
start_time = time.time()
primes_count = find_primes(start, end)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
print("Total prime numbers found:", primes_count)
main()
Python Execution Time: 101.9 seconds (too slow)
- Python Implementation (With Numba):
import time
import numba@numba.jit
def is_prime_numba(num):
if num <= 1:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True
@numba.njit(fastmath=True, cache=True, parallel=True)
def find_primes_numba(start, end):
# return [num for num in numba.prange(start, end + 1) if is_prime_numba(num)]
count = 0
for num in numba.prange(start, end + 1):
if is_prime_numba(num):
count += 1
return count
def main():
start = 0
end = 10000000
# Find primes and measure execution time
start_time = time.time()
primes_count = find_primes_numba(start, end)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time (with Numba):", execution_time, "seconds")
print("Total prime numbers found:", primes_count)
main()
Python with Numba Execution Time: 2.3 seconds (the fastest)
After running the provided code, we observed that Python with Numba outperformed the C++ implementation in terms of execution time for finding prime numbers count within the range of 0 to 10,000,000. This result might seem surprising at first, as traditionally C++ is known for its superior performance compared to Python due to its nature as a compiled language. However, with the help of Numba’s just-in-time (JIT) compilation and parallel processing features, Python code can achieve significant speedups.
Numba’s @numba.jit
decorator and @numba.njit(parallel=True)
option enable efficient compilation and parallel execution of the code, respectively. The combination of Numba's capabilities allows the Python code to be heavily optimized for numerical computations and computationally intensive tasks such as prime number searching.
During the execution, Numba effectively translates the Python code into optimized machine code, reducing the overhead associated with Python’s interpreter and improving the code’s performance. Additionally, the use of parallel processing with Numba’s numba.prange
function allows the code to leverage multiple CPU cores, maximizing computational power.
As a result, Python with Numba surpasses the performance of the C++ implementation, showcasing how Numba can elevate Python’s capabilities for numerical computations and computationally demanding algorithms. This combination of simplicity and performance makes Python with Numba an excellent choice for tasks that require both speed and ease of development. It allows developers to write high-level Python code while achieving performance that was traditionally associated with low-level languages like C++.
Numba excels in scenarios where performance is critical, and numerical computations, simulations, and scientific calculations form a significant part of the workload. It shines in the following use cases:
Numba enhances the performance of complex scientific algorithms, simulations, and data analysis tasks, providing a significant boost to researchers and scientists.
Example — Numerical Integration (Trapezoidal Rule) Explanation:
The trapezoidal rule is a numerical integration method used to approximate the definite integral of a function. It divides the area under the curve of the function into trapezoids and sums up their areas to approximate the integral.
import time
import numbadef f(x):
# The function to be integrated
return x**2
def numerical_integration_without_numba(f, a, b, n):
h = (b - a) / n
integral = (f(a) + f(b)) / 2.0
for i in range(1, n):
x = a + i * h
integral += f(x)
integral *= h
return integral
@numba.jit
def g(x):
# The function to be integrated
return x**2
@numba.jit
def numerical_integration_with_numba(f, a, b, n):
h = (b - a) / n
integral = (f(a) + f(b)) / 2.0
for i in range(1, n):
x = a + i * h
integral += f(x)
integral *= h
return integral
def main():
a = 0.0 # Lower limit of integration
b = 1.0 # Upper limit of integration
n = 10000000 # Number of trapezoids
# Without Numba
start_time = time.time()
result_without_numba = numerical_integration_without_numba(f, a, b, n)
end_time = time.time()
execution_time_without_numba = end_time - start_time
print("Numerical Integration without Numba:")
print("Result:", result_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")
# With Numba
start_time = time.time()
result_with_numba = numerical_integration_with_numba(g, a, b, n)
end_time = time.time()
execution_time_with_numba = end_time - start_time
print("Numerical Integration with Numba:")
print("Result:", result_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
main()
Execution Time:
Without Numba: 2.3 seconds
With Numba: 0.3 seconds
Numba can accelerate various machine learning algorithms, particularly those involving array computations and linear algebra operations, leading to faster model training and predictions.
Example — Linear Regression: Linear regression is a popular supervised learning algorithm used for predicting a continuous target variable based on one or more predictor variables. In this example, we’ll perform simple linear regression with one predictor variable.
import numpy as np
import time
import numbadef linear_regression_without_numba(X, y):
n = len(X)
X_mean = np.mean(X)
y_mean = np.mean(y)
numerator = 0.0
denominator = 0.0
for i in range(n):
numerator += (X[i] - X_mean) * (y[i] - y_mean)
denominator += (X[i] - X_mean) ** 2
slope = numerator / denominator
intercept = y_mean - slope * X_mean
return slope, intercept
@numba.jit
def linear_regression_with_numba(X, y):
n = len(X)
X_mean = np.mean(X)
y_mean = np.mean(y)
numerator = 0.0
denominator = 0.0
for i in range(n):
numerator += (X[i] - X_mean) * (y[i] - y_mean)
denominator += (X[i] - X_mean) ** 2
slope = numerator / denominator
intercept = y_mean - slope * X_mean
return slope, intercept
def main():
# Generate a large dataset
np.random.seed(0)
X = np.random.rand(10000000) # Predictor variable
y = 2 * X + 3 + np.random.randn(10000000) # Target variable (with some noise)
# Without Numba
start_time = time.time()
slope, intercept = linear_regression_without_numba(X, y)
end_time = time.time()
execution_time_without_numba = end_time - start_time
print("Linear Regression without Numba:")
print("Slope:", slope)
print("Intercept:", intercept)
print("Execution time:", execution_time_without_numba, "seconds")
# With Numba
start_time = time.time()
slope, intercept = linear_regression_with_numba(X, y)
end_time = time.time()
execution_time_with_numba = end_time - start_time
print("Linear Regression with Numba:")
print("Slope:", slope)
print("Intercept:", intercept)
print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
main()
Execution time:
Without Numba: 7.7 seconds
With Numba: 0.5 seconds
Numba proves invaluable for simulations and solving differential equations, enabling engineers and physicists to achieve results efficiently.
Example — Simulation of Particle Motion with Constant Force Explanation: In this example, we’ll simulate the motion of a particle moving under the influence of a constant force. We’ll use the equations of motion to update the particle’s position and velocity over time.
import time
import numbadef simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
position = initial_position
velocity = initial_velocity
for _ in range(num_steps):
acceleration = constant_force / mass
velocity += acceleration * time_step
position += velocity * time_step
return position
@numba.jit
def simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
position = initial_position
velocity = initial_velocity
for _ in range(num_steps):
acceleration = constant_force / mass
velocity += acceleration * time_step
position += velocity * time_step
return position
def main():
# Particle parameters
mass = 1.0
initial_position = 0.0
initial_velocity = 0.0
constant_force = 10.0
# Simulation parameters
time_step = 0.01
num_steps = 10000000
# Without Numba
start_time = time.time()
final_position_without_numba = simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
end_time = time.time()
execution_time_without_numba = end_time - start_time
print("Simulation without Numba:")
print("Final Position:", final_position_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")
# With Numba
start_time = time.time()
final_position_with_numba = simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
end_time = time.time()
execution_time_with_numba = end_time - start_time
print("Simulation with Numba:")
print("Final Position:", final_position_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
main()
Execution time:
Without Numba: 0.8 seconds
With Numba: 0.2 seconds
Numba can be employed to optimize financial calculations, such as option pricing, portfolio optimization, and risk analysis, facilitating real-time decision-making.
Example — Option Pricing with Monte Carlo Simulation Explanation: Monte Carlo simulation is a widely used technique for option pricing in finance. It involves simulating the future stock price using random walks and then calculating the option payoff based on the simulated stock prices.
import numpy as np
import time
import numbadef option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps):
dt = T / num_steps
total_payoff = 0.0
for _ in range(num_simulations):
S = S0
for _ in range(num_steps):
epsilon = np.random.normal(0.0, 1.0)
S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)
total_payoff += max(S - K, 0)
option_price = total_payoff / num_simulations
return option_price
@numba.jit
def option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps):
dt = T / num_steps
total_payoff = 0.0
for _ in range(num_simulations):
S = S0
for _ in range(num_steps):
epsilon = np.random.normal(0.0, 1.0)
S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)
total_payoff += max(S - K, 0)
option_price = total_payoff / num_simulations
return option_price
def main():
# Option parameters
S0 = 100.0 # Initial stock price
K = 100.0 # Strike price
r = 0.05 # Risk-free interest rate
sigma = 0.2 # Volatility (standard deviation of returns)
T = 1.0 # Time to expiration (in years)
# Monte Carlo simulation parameters
num_simulations = 100000 # Number of simulations
num_steps = 252 # Number of steps (days) for each simulation
# Without Numba
start_time = time.time()
option_price_without_numba = option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps)
end_time = time.time()
execution_time_without_numba = end_time - start_time
print("Option Pricing without Numba:")
print("Option Price:", option_price_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")
# With Numba
start_time = time.time()
option_price_with_numba = option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps)
end_time = time.time()
execution_time_with_numba = end_time - start_time
print("Option Pricing with Numba:")
print("Option Price:", option_price_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
main()
Execution time:
Without Numba: 78.3 seconds
With Numba: 1.4 seconds
As demonstrated in the additional example, Numba’s support for parallel processing allows developers to fully utilize multicore processors and tackle large-scale parallel computations efficiently.
Example — Matrix Multiplication with Parallelization Explanation: Matrix multiplication is a computationally intensive task that can benefit from parallelization. We’ll use Numba’s numba.prange
function to parallelize the nested loops for matrix multiplication, taking advantage of multiple CPU cores.
import numpy as np
import time
import numbadef matrix_multiply_without_numba(A, B):
m, n, p = A.shape[0], A.shape[1], B.shape[1]
result = np.zeros((m, p), dtype=np.float64)
for i in range(m):
for j in range(p):
for k in range(n):
result[i, j] += A[i, k] * B[k, j]
return result
@numba.njit(parallel=True)
def matrix_multiply_with_numba(A, B):
m, n, p = A.shape[0], A.shape[1], B.shape[1]
result = np.zeros((m, p), dtype=np.float64)
for i in numba.prange(m):
for j in range(p):
for k in range(n):
result[i, j] += A[i, k] * B[k, j]
return result
def main():
# Generate large random matrices
size = 200
A = np.random.rand(size, size)
B = np.random.rand(size, size)
# Without Numba
start_time = time.time()
result_without_numba = matrix_multiply_without_numba(A, B)
end_time = time.time()
execution_time_without_numba = end_time - start_time
print("Matrix Multiplication without Numba:")
print("Execution time:", execution_time_without_numba, "seconds")
# With Numba Parallelization
start_time = time.time()
result_with_numba = matrix_multiply_with_numba(A, B)
end_time = time.time()
execution_time_with_numba = end_time - start_time
print("Matrix Multiplication with Numba Parallelization:")
print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
main()
Execution time:
Without Numba: 4.5 seconds
With Numba: 0.9 seconds
In all the test cases, you will observe a noticeable advantage in using Numba when dealing with large datasets. The functions optimized with Numba consistently outperform the Python implementations without Numba. As the data size increases, the benefit of using Numba becomes even more pronounced, resulting in significant performance improvements. Numba proves to be a valuable asset in scenarios where enhanced execution speed is crucial, such as scientific computing, machine learning, computational physics, financial modeling, and parallel processing. Its ability to harness the power of just-in-time compilation and parallel processing enables developers to achieve remarkable performance gains, especially when dealing with extensive and computationally intensive tasks. As the data scales up, Numba’s impact on speeding up operations becomes increasingly evident, making it an indispensable tool for data-driven applications.
If you’re interested in exploring the code comparisons between Python with Numba and other programming languages or delving deeper into various Numba use cases, you can find the relevant code and examples in the following GitHub repositories:
- Python-Numba-vs-Other-Languages:
https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages - Numba-Use-Cases:
https://github.com/Eng-Elias/Numba-Use-Cases
Feel free to explore and contribute these repositories, fork them, and experiment with the code to gain insights into the potential of Numba for accelerating your own Python projects. Whether you’re a data scientist, software engineer, or programming enthusiast, these repositories aim to offer valuable resources for harnessing Numba’s speed and efficiency in your computational endeavors.
By sharing code comparisons and practical use cases, we hope to encourage and inspire the adoption of Numba in diverse fields, enabling developers to unlock the full potential of Python as a high-performance language.
Happy coding and optimizing!
Numba has undoubtedly proven to be a game-changer for Python developers seeking enhanced performance in computationally intensive tasks. By leveraging Numba’s JIT compilation capabilities, Python can compete with traditionally faster languages like C++, C#, Rust and JavaScript. However, it’s essential to consider the nature of the task at hand when deciding whether to use Numba or not. For numerical computations, simulations, scientific calculations, and algorithms that can benefit from parallelization, Numba can be a valuable addition to the Python developer’s toolbox. When performance is a critical factor, Numba empowers Python developers to achieve optimal execution speeds without sacrificing Python’s simplicity and expressiveness.
Numba: A High Performance Python Compiler (pydata.org)
https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages