Performance Tips — Numba 0+untagged.871.g53e976f.dirty documentation (2024)

This is a short guide to features present in Numba that can help with obtainingthe best performance from code. Two examples are used, both are entirelycontrived and exist purely for pedagogical reasons to motivate discussion.The first is the computation of the trigonometric identitycos(x)^2 + sin(x)^2, the second is a simple element wise square root of avector with reduction over summation. All performance numbers are indicativeonly and unless otherwise stated were taken from running on an Intel i7-4790CPU (4 hardware threads) with an input of np.arange(1.e7).

Note

A reasonably effective approach to achieving high performance code is toprofile the code running with real data and use that to guide performancetuning. The information presented here is to demonstrate features, not to actas canonical guidance!

NoPython mode

The default mode in which Numba’s @jit decorator operates isnopython mode. This mode is most restrictive about what can be compiled,but results in faster executable code.

Note

Historically (prior to 0.59.0) the default compilation mode was a fall-backmode whereby the compiler would try to compile in nopython mode andif it failed it would fall-back to object mode. It is likely thatyou’ll see @jit(nopython=True), or its alias @njit, in use incode/documentation as this was the recommended best practice method to forceuse of nopython mode. Since Numba 0.59.0 this is no long necessaryas nopython mode is the default mode for @jit.

Loops

Whilst NumPy has developed a strong idiom around the use of vector operations,Numba is perfectly happy with loops too. For users familiar with C or Fortran,writing Python in this style will work fine in Numba (after all, LLVM gets alot of use in compiling C lineage languages). For example:

@njitdef ident_np(x): return np.cos(x) ** 2 + np.sin(x) ** 2@njitdef ident_loops(x): r = np.empty_like(x) n = len(x) for i in range(n): r[i] = np.cos(x[i]) ** 2 + np.sin(x[i]) ** 2 return r

The above run at almost identical speeds when decorated with @njit, withoutthe decorator the vectorized function is a couple of orders of magnitude faster.

Function Name

@njit

Execution time

ident_np

No

0.581s

ident_np

Yes

0.659s

ident_loops

No

25.2s

ident_loops

Yes

0.670s

A Case for Object mode: LoopLifting

Some functions may be incompatible with the restrictive nopython modebut contain compatible loops. You can enable these functions to attempt nopythonmode on their loops by setting @jit(forceobj=True). The incompatible codesegments will run in object mode.

Whilst using looplifting in object mode can provide some performance increase,compiling functions entirely in nopython mode is key to achievingoptimal performance.

Fastmath

In certain classes of applications strict IEEE 754 compliance is lessimportant. As a result it is possible to relax some numerical rigour withview of gaining additional performance. The way to achieve this behaviour inNumba is through the use of the fastmath keyword argument:

@njit(fastmath=False)def do_sum(A): acc = 0. # without fastmath, this loop must accumulate in strict order for x in A: acc += np.sqrt(x) return acc@njit(fastmath=True)def do_sum_fast(A): acc = 0. # with fastmath, the reduction can be vectorized as floating point # reassociation is permitted. for x in A: acc += np.sqrt(x) return acc

Function Name

Execution time

do_sum

35.2 ms

do_sum_fast

17.8 ms

In some cases you may wish to opt-in to only a subset of possible fast-mathoptimizations. This can be done by supplying a set of LLVM fast-math flags to fastmath.:

def add_assoc(x, y): return (x - y) + yprint(njit(fastmath=False)(add_assoc)(0, np.inf)) # nanprint(njit(fastmath=True) (add_assoc)(0, np.inf)) # 0.0print(njit(fastmath={'reassoc', 'nsz'})(add_assoc)(0, np.inf)) # 0.0print(njit(fastmath={'reassoc'}) (add_assoc)(0, np.inf)) # nanprint(njit(fastmath={'nsz'}) (add_assoc)(0, np.inf)) # nan

Parallel=True

If code contains operations that are parallelisable (and supported) Numba can compile a version that will run inparallel on multiple native threads (no GIL!). This parallelisation is performedautomatically and is enabled by simply adding the parallel keywordargument:

@njit(parallel=True)def ident_parallel(x): return np.cos(x) ** 2 + np.sin(x) ** 2

Executions times are as follows:

Function Name

Execution time

ident_parallel

112 ms

The execution speed of this function with parallel=True present isapproximately 5x that of the NumPy equivalent and 6x that of standard@njit.

Numba parallel execution also has support for explicit parallel loopdeclaration similar to that in OpenMP. To indicate that a loop should beexecuted in parallel the numba.prange function should be used, this functionbehaves like Python range and if parallel=True is not set it actssimply as an alias of range. Loops induced with prange can be used forembarrassingly parallel computation and also reductions.

Revisiting the reduce over sum example, assuming it is safe for the sum to beaccumulated out of order, the loop in n can be parallelised through the useof prange. Further, the fastmath=True keyword argument can be addedwithout concern in this case as the assumption that out of order execution isvalid has already been made through the use of parallel=True (as each threadcomputes a partial sum).

@njit(parallel=True)def do_sum_parallel(A): # each thread can accumulate its own partial sum, and then a cross # thread reduction is performed to obtain the result to return n = len(A) acc = 0. for i in prange(n): acc += np.sqrt(A[i]) return acc@njit(parallel=True, fastmath=True)def do_sum_parallel_fast(A): n = len(A) acc = 0. for i in prange(n): acc += np.sqrt(A[i]) return acc

Execution times are as follows, fastmath again improves performance.

Function Name

Execution time

do_sum_parallel

9.81 ms

do_sum_parallel_fast

5.37 ms

Intel SVML

Intel provides a short vector math library (SVML) that contains a large numberof optimised transcendental functions available for use as compilerintrinsics. If the intel-cmplr-lib-rt package is present in theenvironment (or the SVML libraries are simply locatable!) then Numbaautomatically configures the LLVM back end to use the SVML intrinsic functionswhere ever possible. SVML provides both high and low accuracy versions of eachintrinsic and the version that is used is determined through the use of thefastmath keyword. The default is to use high accuracy which is accurate towithin 1 ULP, however if fastmath is set to True then the loweraccuracy versions of the intrinsics are used (answers to within 4 ULP).

First obtain SVML, using conda for example:

conda install intel-cmplr-lib-rt

Note

The SVML library was previously provided through the icc_rt condapackage. The icc_rt package has since become a meta-package and as ofversion 2021.1.1 it has intel-cmplr-lib-rt amongst other packages asa dependency. Installing the recommended intel-cmplr-lib-rt packagedirectly results in fewer installed packages.

Rerunning the identity function example ident_np from above with variouscombinations of options to @njit and with/without SVML yields the followingperformance results (input size np.arange(1.e8)). For reference, with justNumPy the function executed in 5.84s:

@njit kwargs

SVML

Execution time

None

No

5.95s

None

Yes

2.26s

fastmath=True

No

5.97s

fastmath=True

Yes

1.8s

parallel=True

No

1.36s

parallel=True

Yes

0.624s

parallel=True, fastmath=True

No

1.32s

parallel=True, fastmath=True

Yes

0.576s

It is evident that SVML significantly increases the performance of thisfunction. The impact of fastmath in the case of SVML not being present iszero, this is expected as there is nothing in the original function that wouldbenefit from relaxing numerical strictness.

Linear algebra

Numba supports most of numpy.linalg in no Python mode. The internalimplementation relies on a LAPACK and BLAS library to do the numerical workand it obtains the bindings for the necessary functions from SciPy. Therefore,to achieve good performance in numpy.linalg functions with Numba it isnecessary to use a SciPy built against a well optimised LAPACK/BLAS library.In the case of the Anaconda distribution SciPy is built against Intel’s MKLwhich is highly optimised and as a result Numba makes use of this performance.

Performance Tips — Numba 0+untagged.871.g53e976f.dirty documentation (2024)
Top Articles
APPS - American Para Professional Systems, Inc. hiring Medical/Paramedical Examiner in Fairfield, Iowa, United States | LinkedIn
Astrology Forecasts Archives
Ups Customer Center Locations
This website is unavailable in your location. – WSB-TV Channel 2 - Atlanta
Kansas City Kansas Public Schools Educational Audiology Externship in Kansas City, KS for KCK public Schools
Login Page
Access-A-Ride – ACCESS NYC
Top Scorers Transfermarkt
Don Wallence Auto Sales Vehicles
Math Playground Protractor
Ashlyn Peaks Bio
O'reilly's In Monroe Georgia
Best Theia Builds (Talent | Skill Order | Pairing + Pets) In Call of Dragons - AllClash
Compare the Samsung Galaxy S24 - 256GB - Cobalt Violet vs Apple iPhone 16 Pro - 128GB - Desert Titanium | AT&T
7 Low-Carb Foods That Fill You Up - Keto Tips
Directions To 401 East Chestnut Street Louisville Kentucky
Dignity Nfuse
Pekin Soccer Tournament
3S Bivy Cover 2D Gen
Hdmovie2 Sbs
Jc Green Obits
Craigslist Apartments Baltimore
Suspiciouswetspot
Kabob-House-Spokane Photos
January 8 Jesus Calling
Weathervane Broken Monorail
12657 Uline Way Kenosha Wi
Evil Dead Rise Showtimes Near Regal Sawgrass & Imax
Funky Town Gore Cartel Video
'Conan Exiles' 3.0 Guide: How To Unlock Spells And Sorcery
About | Swan Medical Group
Dumb Money, la recensione: Paul Dano e quel film biografico sul caso GameStop
Walter King Tut Johnson Sentenced
Cars And Trucks Facebook
Yoshidakins
Shnvme Com
Tas Restaurant Fall River Ma
Dr Adj Redist Cadv Prin Amex Charge
Studentvue Columbia Heights
Today's Gas Price At Buc-Ee's
D-Day: Learn about the D-Day Invasion
Gold Dipping Vat Terraria
Acts 16 Nkjv
Penny Paws San Antonio Photos
Makes A Successful Catch Maybe Crossword Clue
War Room Pandemic Rumble
Theater X Orange Heights Florida
Zipformsonline Plus Login
Amateur Lesbian Spanking
Ephesians 4 Niv
Enjoy Piggie Pie Crossword Clue
Ubg98.Github.io Unblocked
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 6074

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.