If you are a newcomer to Cython just like me, it is probably that you will be confused by the usage time of
Below, I’ll briefly introduce how these two types are fundamentally different, and generalize this concept to other datatypes as well.
Before that, we need to know a bit of Cython.
cimport in Cython
In Python, we use the
import statement to access functions, objects, and classes inside other modules and packages.
Cython also fully supports the
import statement, so it allows us to access Python objects defined in external Python modules.
However, note that if above were the end of the story, Cython modules would not allow to access other Cython modules’
ctypedefs, or structs, and it would not allow C-level access to other extension types.
To remedy this, Cython has a
cimport statement that provides compile time access to C-level constructs, and it looks for these constructs’ declarations from separate Cython files called definition files, which have a .pxd extension and need to be created by us.
In a .pxd file, we can place the declarations of C-level constructs that we wish to share, and only the declarations here are cimportable. Also, since
some_file_name.pxd created by us have the same base name as the original file
some_file_name.pyx, they are treated as one namespace by Cython. Therefore, we need to modify
some_file_name.pyx in order to remove the repeat declarations in it.
And after we have created the .pxd file and clear the .pyx file, now an external implementation file can access all C-level constructs inside .pyx via the
Let’s take an real-world example to see how
cimport numpy as np
In some files of the well-known machine library scikit-learn (such as this one), you can find the following code snippet:
cimport numpy as np import numpy as np
I remember I was stunned when I saw these lines for the first time, WHAT IS IT?
Well, the good news is that we now know the basics of the
cimport statement, so we can figure it out step by step.
First, since only the declarations in .pxd file are cimportable, we have to identify which .pxd file is Cython looking when executing
cimport numpy as np.
After a bit of research, we should find that a file called
__init__.pxd lies in the numpy folder under our Cython installation. An
__init__.pxd file can make Cython treat the directory as a package just like how
__init.py__ works for Python (see here). Therefore, in this case Cython will treat the numpy folder as a package and give us access to Numpy’s C API defined in the
__init__.pxd file during compile time.
On the contrary,
import numpy as np will only give us access to Numpy’s pure-Python API and it occurs at runtime.
Note that here we use the same alias (i.e.,
np) for both of the imported external packages, but thanks to the almighty Cython which will internally handles this ambiguity, we don’t not need to use different names.
np.float64 v.s np.float64_t
So here comes our main topic, what is the difference between
np.float64_t, and which should I use?
np.float64 is a Python
type object that is defined at Python level to represent 64 bits float data, and it has common attributes such as
__name__ that most of other Python objects have too. You can simply use the following code to verify it:
import numpy as np type(np.float64) print np.float64 print np.float64.__name__
__init__.pxd, you can find the following lines:
ctypedef double npy_float64 ctypedef npy_float64 float64_t
So it is clear that
np.float64_t represents the type
double in C, and it is nowhere near as a Python object. Therefore, if you call
print np.float64_t in a .pyx file, it will warn you the following message during compile time:
'float64_t' is not a constant, variable or function identifier
Which to use?
Let’s take another simple example to illustrate the usage time between these two types:
import numpy as np cimport numpy as np def test(): // 1 cdef np.ndarray[np.float64_t, ndim=1] array // 2 array = np.empty(10, dtype=np.float64) print array
np.ndarrayto declare the type of the object exposing the buffer interface, and place C data type inside the bracket for the array elements. So, we should make sure we use
np.float64_there to specify the element’s data type .
To initialize the Numpy buffer we just declared, we can create an array object at Python level and assign it to the Numpy buffer. In this case, we should use
np.float64since we are not declaring C type variable.
Of course, The same concept can be generalized to other data types (e.g.,
After working on Cython for a month, I found debugging in Cython is both hard and frustrated because the documents is not really thorough.
Consequently, I hope this blog post can safe your effort by helping you clarify the difference between data types defined in Cython and Python.
In the future, I will also document more of my findings about Cython during my GSoC.