Numba: Unleashing the Power of Python for High-Performance Computing (2024)

Eng. Elias Owis

12 min read

Aug 1, 2023

Numba: Unleashing the Power of Python for High-Performance Computing (2)

Python, with its user-friendly syntax and extensive libraries, has emerged as a versatile and widely-used programming language across various domains. However, its interpretive nature often leads to performance bottlenecks, especially when dealing with computationally intensive tasks. Traditionally, developers have turned to languages like C++, C#, Rust and JavaScript for improved execution speed. In this article, we explore Numba, a game-changing library that enables Python to compete with these lower-level languages by harnessing the power of just-in-time (JIT) compilation. We will delve into Numba’s features, provide a comprehensive comparison of Python with Numba against other languages, explore additional examples showcasing Numba’s capabilities, and discuss when and where to effectively leverage Numba’s capabilities.

Numba, an open-source project backed by Anaconda, has revolutionized Python’s performance landscape by providing a JIT compiler that translates Python code into optimized machine code. Unlike traditional Python interpreters, Numba compiles Python functions on-the-fly, yielding remarkable speed-ups by leveraging the Low-Level Virtual Machine (LLVM) infrastructure. The result is highly efficient native machine code that rivals the performance of compiled languages like C++.

Let’s create a code example for a complicated algorithm that performs a brute-force search to count all prime numbers within a given range. Brute-force searching for prime numbers can be computationally expensive, especially for larger ranges. We’ll implement this algorithm in C++, C#, JavaScript, Rust, Python without Numba, and Python with Numba. We will compare the performance and execution time of these implementations.

A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Brute-force searching involves checking each number within the given range to determine if it is prime. We’ll use a simple function to check for prime numbers.

To evaluate the performance of the implementations, we’ll run the algorithms with a large list of numbers and measure the execution time.

(I will set the execution time on my laptop).

C++ Implementation:

#include <iostream>
#include <vector>
#include <ctime>bool is_prime(int num)
{
 if (num <= 1)
 return false;
 for (int i = 2; i * i <= num; ++i)
 {
 if (num % i == 0)
 return false;
 }
 return true;
}
int find_primes(int start, int end)
{
 int count = 0;
 for (int num = start; num <= end; ++num)
 {
 if (is_prime(num))
 {
 count++;
 }
 }
 return count;
}
int main()
{
 int start = 0;
 int end = 10000000;
 std::vector<int> primes;
 // Find primes and measure execution time
 clock_t start_time = clock();
 int primes_count = find_primes(start, end);
 clock_t end_time = clock();
 double execution_time = static_cast<double>(end_time - start_time) / CLOCKS_PER_SEC;
 std::cout << "Execution time: " << execution_time << " seconds" << std::endl;
 std::cout << "Total prime numbers found: " << primes.size() << std::endl;
 return 0;
}

C++ Execution Time: 8.9 seconds

C# Implementation:

using System;
using System.Collections.Generic;
using System.Diagnostics;public class Program
{
 public static bool IsPrime(int num)
 {
 if (num <= 1) return false;
 for (int i = 2; i * i <= num; ++i)
 {
 if (num % i == 0) return false;
 }
 return true;
 }
 public static int FindPrimes(int start, int end)
 {
 int count = 0;
 for (int num = start; num <= end; ++num)
 {
 if (IsPrime(num))
 {
 count++;
 }
 }
 return count;
 }
 public static void Main()
 {
 int start = 0;
 int end = 10000000;
 // Find primes and measure execution time
 Stopwatch stopwatch = new Stopwatch();
 stopwatch.Start();
 int primes_count = FindPrimes(start, end);
 stopwatch.Stop();
 double executionTime = stopwatch.Elapsed.TotalSeconds;
 Console.WriteLine("Execution time: " + executionTime + " seconds");
 Console.WriteLine("Total prime numbers found: " + primes_count);
 }
}

C# Execution Time: 9.0 seconds

Rust Implementation:

use std::time::Instant;fn is_prime(num: i32) -> bool {
 if num <= 1 {
 return false;
 }
 for i in 2..=((num as f64).sqrt() as i32) {
 if num % i == 0 {
 return false;
 }
 }
 true
}
fn find_primes(start: i32, end: i32) -> i32 {
 let mut count = 0;
 for num in start..=end {
 if is_prime(num) {
 count += 1;
 }
 }
 count
}
fn main() {
 let start = 0;
 let end = 10000000;
 // Find primes and measure execution time
 let start_time = Instant::now();
 let primes_count = find_primes(start, end);
 let end_time = Instant::now();
 let execution_time = end_time.duration_since(start_time).as_secs_f64();
 println!("Execution time: {} seconds", execution_time);
 println!("Total prime numbers found: {}", primes_count);
}

Rust Execution Time: 16.2 seconds

JavaScript Implementation:

function isPrime(num) {
 if (num <= 1) return false;
 for (let i = 2; i * i <= num; ++i) {
 if (num % i === 0) return false;
 }
 return true;
}function findPrimes(start, end) {
 let count = 0;
 for (let num = start; num <= end; ++num) {
 if (isPrime(num)) {
 count++;
 }
 }
 return count;
}
function main() {
 const start = 0;
 const end = 10000000;
 // Find primes and measure execution time
 const startTime = new Date();
 const primes_count = findPrimes(start, end);
 const endTime = new Date();
 const executionTime = (endTime - startTime) / 1000;
 console.log("Execution time:", executionTime, "seconds");
 console.log("Total prime numbers found:", primes_count);
}
main();

JS Execution Time: 8.9 seconds

Python Implementation (Without Numba):

import timedef is_prime(num):
 if num <= 1:
 return False
 for i in range(2, int(num**0.5) + 1):
 if num % i == 0:
 return False
 return True
def find_primes(start, end):
 count = 0
 for num in range(start, end + 1):
 if is_prime(num):
 count += 1
 return count
def main():
 start = 0
 end = 10000000
 # Find primes and measure execution time
 start_time = time.time()
 primes_count = find_primes(start, end)
 end_time = time.time()
 execution_time = end_time - start_time
 print("Execution time:", execution_time, "seconds")
 print("Total prime numbers found:", primes_count)
main()

Python Execution Time: 101.9 seconds (too slow)

Python Implementation (With Numba):

import time
import numba@numba.jit
def is_prime_numba(num):
 if num <= 1:
 return False
 for i in range(2, int(num**0.5) + 1):
 if num % i == 0:
 return False
 return True
@numba.njit(fastmath=True, cache=True, parallel=True)
def find_primes_numba(start, end):
 # return [num for num in numba.prange(start, end + 1) if is_prime_numba(num)]
 count = 0
 for num in numba.prange(start, end + 1):
 if is_prime_numba(num):
 count += 1
 return count
def main():
 start = 0
 end = 10000000
 # Find primes and measure execution time
 start_time = time.time()
 primes_count = find_primes_numba(start, end)
 end_time = time.time()
 execution_time = end_time - start_time
 print("Execution time (with Numba):", execution_time, "seconds")
 print("Total prime numbers found:", primes_count)
main()

Python with Numba Execution Time: 2.3 seconds (the fastest)

After running the provided code, we observed that Python with Numba outperformed the C++ implementation in terms of execution time for finding prime numbers count within the range of 0 to 10,000,000. This result might seem surprising at first, as traditionally C++ is known for its superior performance compared to Python due to its nature as a compiled language. However, with the help of Numba’s just-in-time (JIT) compilation and parallel processing features, Python code can achieve significant speedups.

Numba’s @numba.jit decorator and @numba.njit(parallel=True) option enable efficient compilation and parallel execution of the code, respectively. The combination of Numba's capabilities allows the Python code to be heavily optimized for numerical computations and computationally intensive tasks such as prime number searching.

During the execution, Numba effectively translates the Python code into optimized machine code, reducing the overhead associated with Python’s interpreter and improving the code’s performance. Additionally, the use of parallel processing with Numba’s numba.prange function allows the code to leverage multiple CPU cores, maximizing computational power.

As a result, Python with Numba surpasses the performance of the C++ implementation, showcasing how Numba can elevate Python’s capabilities for numerical computations and computationally demanding algorithms. This combination of simplicity and performance makes Python with Numba an excellent choice for tasks that require both speed and ease of development. It allows developers to write high-level Python code while achieving performance that was traditionally associated with low-level languages like C++.

Numba excels in scenarios where performance is critical, and numerical computations, simulations, and scientific calculations form a significant part of the workload. It shines in the following use cases:

Numba enhances the performance of complex scientific algorithms, simulations, and data analysis tasks, providing a significant boost to researchers and scientists.

Example — Numerical Integration (Trapezoidal Rule) Explanation:
The trapezoidal rule is a numerical integration method used to approximate the definite integral of a function. It divides the area under the curve of the function into trapezoids and sums up their areas to approximate the integral.

import time
import numbadef f(x):
 # The function to be integrated
 return x**2
def numerical_integration_without_numba(f, a, b, n):
 h = (b - a) / n
 integral = (f(a) + f(b)) / 2.0
 for i in range(1, n):
 x = a + i * h
 integral += f(x)
 integral *= h
 return integral
@numba.jit
def g(x):
 # The function to be integrated
 return x**2
@numba.jit
def numerical_integration_with_numba(f, a, b, n):
 h = (b - a) / n
 integral = (f(a) + f(b)) / 2.0
 for i in range(1, n):
 x = a + i * h
 integral += f(x)
 integral *= h
 return integral
def main():
 a = 0.0 # Lower limit of integration
 b = 1.0 # Upper limit of integration
 n = 10000000 # Number of trapezoids
 # Without Numba
 start_time = time.time()
 result_without_numba = numerical_integration_without_numba(f, a, b, n)
 end_time = time.time()
 execution_time_without_numba = end_time - start_time
 print("Numerical Integration without Numba:")
 print("Result:", result_without_numba)
 print("Execution time:", execution_time_without_numba, "seconds")
 # With Numba
 start_time = time.time()
 result_with_numba = numerical_integration_with_numba(g, a, b, n)
 end_time = time.time()
 execution_time_with_numba = end_time - start_time
 print("Numerical Integration with Numba:")
 print("Result:", result_with_numba)
 print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
 main()

Execution Time:

Without Numba: 2.3 seconds

With Numba: 0.3 seconds

Numba can accelerate various machine learning algorithms, particularly those involving array computations and linear algebra operations, leading to faster model training and predictions.

Example — Linear Regression: Linear regression is a popular supervised learning algorithm used for predicting a continuous target variable based on one or more predictor variables. In this example, we’ll perform simple linear regression with one predictor variable.

import numpy as np
import time
import numbadef linear_regression_without_numba(X, y):
 n = len(X)
 X_mean = np.mean(X)
 y_mean = np.mean(y)
 numerator = 0.0
 denominator = 0.0
 for i in range(n):
 numerator += (X[i] - X_mean) * (y[i] - y_mean)
 denominator += (X[i] - X_mean) ** 2
 slope = numerator / denominator
 intercept = y_mean - slope * X_mean
 return slope, intercept
@numba.jit
def linear_regression_with_numba(X, y):
 n = len(X)
 X_mean = np.mean(X)
 y_mean = np.mean(y)
 numerator = 0.0
 denominator = 0.0
 for i in range(n):
 numerator += (X[i] - X_mean) * (y[i] - y_mean)
 denominator += (X[i] - X_mean) ** 2
 slope = numerator / denominator
 intercept = y_mean - slope * X_mean
 return slope, intercept
def main():
 # Generate a large dataset
 np.random.seed(0)
 X = np.random.rand(10000000) # Predictor variable
 y = 2 * X + 3 + np.random.randn(10000000) # Target variable (with some noise)
 # Without Numba
 start_time = time.time()
 slope, intercept = linear_regression_without_numba(X, y)
 end_time = time.time()
 execution_time_without_numba = end_time - start_time
 print("Linear Regression without Numba:")
 print("Slope:", slope)
 print("Intercept:", intercept)
 print("Execution time:", execution_time_without_numba, "seconds")
 # With Numba
 start_time = time.time()
 slope, intercept = linear_regression_with_numba(X, y)
 end_time = time.time()
 execution_time_with_numba = end_time - start_time
 print("Linear Regression with Numba:")
 print("Slope:", slope)
 print("Intercept:", intercept)
 print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
 main()

Execution time:

Without Numba: 7.7 seconds

With Numba: 0.5 seconds

Numba proves invaluable for simulations and solving differential equations, enabling engineers and physicists to achieve results efficiently.

Example — Simulation of Particle Motion with Constant Force Explanation: In this example, we’ll simulate the motion of a particle moving under the influence of a constant force. We’ll use the equations of motion to update the particle’s position and velocity over time.

import time
import numbadef simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
 position = initial_position
 velocity = initial_velocity
 for _ in range(num_steps):
 acceleration = constant_force / mass
 velocity += acceleration * time_step
 position += velocity * time_step
 return position
@numba.jit
def simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
 position = initial_position
 velocity = initial_velocity
 for _ in range(num_steps):
 acceleration = constant_force / mass
 velocity += acceleration * time_step
 position += velocity * time_step
 return position
def main():
 # Particle parameters
 mass = 1.0
 initial_position = 0.0
 initial_velocity = 0.0
 constant_force = 10.0
 # Simulation parameters
 time_step = 0.01
 num_steps = 10000000
 # Without Numba
 start_time = time.time()
 final_position_without_numba = simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
 end_time = time.time()
 execution_time_without_numba = end_time - start_time
 print("Simulation without Numba:")
 print("Final Position:", final_position_without_numba)
 print("Execution time:", execution_time_without_numba, "seconds")
 # With Numba
 start_time = time.time()
 final_position_with_numba = simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
 end_time = time.time()
 execution_time_with_numba = end_time - start_time
 print("Simulation with Numba:")
 print("Final Position:", final_position_with_numba)
 print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
 main()

Execution time:

Without Numba: 0.8 seconds

With Numba: 0.2 seconds

Numba can be employed to optimize financial calculations, such as option pricing, portfolio optimization, and risk analysis, facilitating real-time decision-making.

Example — Option Pricing with Monte Carlo Simulation Explanation: Monte Carlo simulation is a widely used technique for option pricing in finance. It involves simulating the future stock price using random walks and then calculating the option payoff based on the simulated stock prices.

import numpy as np
import time
import numbadef option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps):
 dt = T / num_steps
 total_payoff = 0.0
 for _ in range(num_simulations):
 S = S0
 for _ in range(num_steps):
 epsilon = np.random.normal(0.0, 1.0)
 S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)
 total_payoff += max(S - K, 0)
 option_price = total_payoff / num_simulations
 return option_price
@numba.jit
def option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps):
 dt = T / num_steps
 total_payoff = 0.0
 for _ in range(num_simulations):
 S = S0
 for _ in range(num_steps):
 epsilon = np.random.normal(0.0, 1.0)
 S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)
 total_payoff += max(S - K, 0)
 option_price = total_payoff / num_simulations
 return option_price
def main():
 # Option parameters
 S0 = 100.0 # Initial stock price
 K = 100.0 # Strike price
 r = 0.05 # Risk-free interest rate
 sigma = 0.2 # Volatility (standard deviation of returns)
 T = 1.0 # Time to expiration (in years)
 # Monte Carlo simulation parameters
 num_simulations = 100000 # Number of simulations
 num_steps = 252 # Number of steps (days) for each simulation
 # Without Numba
 start_time = time.time()
 option_price_without_numba = option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps)
 end_time = time.time()
 execution_time_without_numba = end_time - start_time
 print("Option Pricing without Numba:")
 print("Option Price:", option_price_without_numba)
 print("Execution time:", execution_time_without_numba, "seconds")
 # With Numba
 start_time = time.time()
 option_price_with_numba = option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps)
 end_time = time.time()
 execution_time_with_numba = end_time - start_time
 print("Option Pricing with Numba:")
 print("Option Price:", option_price_with_numba)
 print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
 main()

Execution time:

Without Numba: 78.3 seconds

With Numba: 1.4 seconds

As demonstrated in the additional example, Numba’s support for parallel processing allows developers to fully utilize multicore processors and tackle large-scale parallel computations efficiently.

Example — Matrix Multiplication with Parallelization Explanation: Matrix multiplication is a computationally intensive task that can benefit from parallelization. We’ll use Numba’s numba.prange function to parallelize the nested loops for matrix multiplication, taking advantage of multiple CPU cores.

import numpy as np
import time
import numbadef matrix_multiply_without_numba(A, B):
 m, n, p = A.shape[0], A.shape[1], B.shape[1]
 result = np.zeros((m, p), dtype=np.float64)
 for i in range(m):
 for j in range(p):
 for k in range(n):
 result[i, j] += A[i, k] * B[k, j]
 return result
@numba.njit(parallel=True)
def matrix_multiply_with_numba(A, B):
 m, n, p = A.shape[0], A.shape[1], B.shape[1]
 result = np.zeros((m, p), dtype=np.float64)
 for i in numba.prange(m):
 for j in range(p):
 for k in range(n):
 result[i, j] += A[i, k] * B[k, j]
 return result
def main():
 # Generate large random matrices
 size = 200
 A = np.random.rand(size, size)
 B = np.random.rand(size, size)
 # Without Numba
 start_time = time.time()
 result_without_numba = matrix_multiply_without_numba(A, B)
 end_time = time.time()
 execution_time_without_numba = end_time - start_time
 print("Matrix Multiplication without Numba:")
 print("Execution time:", execution_time_without_numba, "seconds")
 # With Numba Parallelization
 start_time = time.time()
 result_with_numba = matrix_multiply_with_numba(A, B)
 end_time = time.time()
 execution_time_with_numba = end_time - start_time
 print("Matrix Multiplication with Numba Parallelization:")
 print("Execution time:", execution_time_with_numba, "seconds")
if __name__ == "__main__":
 main()

Execution time:

Without Numba: 4.5 seconds

With Numba: 0.9 seconds

In all the test cases, you will observe a noticeable advantage in using Numba when dealing with large datasets. The functions optimized with Numba consistently outperform the Python implementations without Numba. As the data size increases, the benefit of using Numba becomes even more pronounced, resulting in significant performance improvements. Numba proves to be a valuable asset in scenarios where enhanced execution speed is crucial, such as scientific computing, machine learning, computational physics, financial modeling, and parallel processing. Its ability to harness the power of just-in-time compilation and parallel processing enables developers to achieve remarkable performance gains, especially when dealing with extensive and computationally intensive tasks. As the data scales up, Numba’s impact on speeding up operations becomes increasingly evident, making it an indispensable tool for data-driven applications.

If you’re interested in exploring the code comparisons between Python with Numba and other programming languages or delving deeper into various Numba use cases, you can find the relevant code and examples in the following GitHub repositories:

Python-Numba-vs-Other-Languages:
https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages
Numba-Use-Cases:
https://github.com/Eng-Elias/Numba-Use-Cases

Feel free to explore and contribute these repositories, fork them, and experiment with the code to gain insights into the potential of Numba for accelerating your own Python projects. Whether you’re a data scientist, software engineer, or programming enthusiast, these repositories aim to offer valuable resources for harnessing Numba’s speed and efficiency in your computational endeavors.

By sharing code comparisons and practical use cases, we hope to encourage and inspire the adoption of Numba in diverse fields, enabling developers to unlock the full potential of Python as a high-performance language.

Happy coding and optimizing!

Numba has undoubtedly proven to be a game-changer for Python developers seeking enhanced performance in computationally intensive tasks. By leveraging Numba’s JIT compilation capabilities, Python can compete with traditionally faster languages like C++, C#, Rust and JavaScript. However, it’s essential to consider the nature of the task at hand when deciding whether to use Numba or not. For numerical computations, simulations, scientific calculations, and algorithms that can benefit from parallelization, Numba can be a valuable addition to the Python developer’s toolbox. When performance is a critical factor, Numba empowers Python developers to achieve optimal execution speeds without sacrificing Python’s simplicity and expressiveness.

Numba: A High Performance Python Compiler (pydata.org)

https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages

https://github.com/Eng-Elias/Numba-Use-Cases

Numba: Unleashing the Power of Python for High-Performance Computing (2024)

FAQs

Does Numba make Python faster? ›

With Numba you can: Run the same code both in normal Python, and in a faster compiled version, from inside the normal interpreter runtime. Easily and quickly iterate on algorithms.

Tell Me More ›

Is Python good for high performance computing? ›

Python is a great programming language. It has the reputation of being slow for computational tasks. While this can be true for pure Python programs, there are many tools and libraries that can help you to get very close to the speed of programs written in C or other compiled languages.

Can Numba speed up pandas? ›

In this part of the tutorial, we will investigate how to speed up certain functions operating on pandas DataFrame using Cython, Numba and pandas. eval() . Generally, using Cython and Numba can offer a larger speedup than using pandas.

Keep Reading ›

Is Numba faster than Cython? ›

In terms of raw performance, both Numba and Cython can significantly speed up Python code. However, the choice between the two often depends on the specific use case and the type of code being optimized. Numba's Strengths: Easy to use, with a simple syntax.

Learn More Now ›

What are the disadvantages of Numba? ›

There are two main negative points to Numba. The first, and most obvious, is that compilation is required, and compilation takes time. If the function is only executed once, the compilation time may be a significant disadvantage.

Show Me More ›

Why is Numba so fast? ›

Using Numba, the right way

It needs to operate on whole arrays (so-called “vectorization”) so that it doesn't use slow Python code. From an algorithm perspective, we can convert each pixel individually. By using Numba the right way, our code is both 5× faster and far more memory efficient.

Discover More ›

Is Numba as fast as Fortran? ›

➠ Intel fortran (in parallel mode) is more than 2x faster than Python numba.

Get More Info Here ›

Does Numba speed up pandas? ›

Learn More ›

How can I make Python run faster? ›

10 ways to make Python programs run faster

Measure, measure, measure.
Memoize (cache) repeatedly used data.
Move math to NumPy.
Move math to Numba.
Use a C library.
Convert to Cython.
Go parallel with multiprocessing.
Know what your libraries are doing.

More items...

Feb 28, 2024

Show Me More ›

Is Numba faster than Julia? ›

However, Julia is still more than 3X faster than Numba, in part due to SIMD optimizations enabled by LoopVectorization. jl. But most importantly, Numba breaks down when we add a minimal higher-level construction.

Get More Info ›