Numpy

Numpy (Short for Numerical Python) provides an efficient interface to Store and operate on arrays of number. Its somewhat looks like python’s builtin list type but much more efficient as the arrays grows in size.

import numpy as np
np.__version__
## '1.8.0rc1'

How numpy improves the performances as compare to base python operartions on arrays of data?

Python is dynamic typing language unlike C and JAVA which are statically type language that requires variable to be declare explicitly.

Let’s take a C code example

/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}

Now the equivalent operation in python can be written as :

result = 0
for i in range(100):
    result +=i

If you notice the difference data type of each variables are explicitly declared in C while in python types are dynamically inferred. So we can assign any type of data to any variable in python.

#python code
x = 4
x = "four"

Same thing in C would lead to compilation error.

/* C code */
int x = 4;
x = "four" //fails

This type of flexibility makes python convenient and easy to use but this type-flexibility also points to the fact that python variables are more than just their value. They also contains the extra information about the type of value.

The standard python implementation was written in C.

When we define an integer in python for example x = 100 is not just a “raw” integer. It is actually a pointer to a compound C structure which contains several values. If we see the source code of python 3.4 or above we find that integer in python actually contains 4 pieces.

  • ob_refcnt a reference count that helps Python silently handle memory allocation and deallocation
  • ob_type which encodes the type of the variable
  • ob_size which specifies the size of the following data members
  • ob_digit which contains the actual integer value that we expect the Python variable to represent.

A C integer is a label for a position in memory whose bytes encode an integer value. A python integer is a pointer to a position in a memory containing all the python object information, including the bytes that contains the integer value.All this additional information information in Python types comes at additional cost.

Fixed type arrays in python

The built in array module helps in creating dense arrays of a uniform type

import array
L = list(range(10))
A = array.array('i',L)
A
## array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here 'i'is a type code indicating the contents are integers.

Python’s array object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data.

import numpy as np
np.array([1,2,3,5,6])
## array([1, 2, 3, 5, 6])

NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible.

np.array([3.14, 1, 2.0,3 ,5])
## array([ 3.14,  1.  ,  2.  ,  3.  ,  5.  ])

Explicity set the data type using keyword dtype.

np.array([1,2,3,4], dtype='float32')
## array([ 1.,  2.,  3.,  4.], dtype=float32)

numpy array can be multi-dimensional unlike python lists.

# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])
## array([[2, 3, 4],
##        [4, 5, 6],
##        [6, 7, 8]])

Creating Arrays from scratch using routines built into NumPy.

# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

# Create a 3x5 floating-point array filled with ones
## array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
np.ones((3,5), dtype=float)

# Create a 3x5 array filled with 3.14
## array([[ 1.,  1.,  1.,  1.,  1.],
##        [ 1.,  1.,  1.,  1.,  1.],
##        [ 1.,  1.,  1.,  1.,  1.]])
np.full((3,5),3.14)

# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
## array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
##        [ 3.14,  3.14,  3.14,  3.14,  3.14],
##        [ 3.14,  3.14,  3.14,  3.14,  3.14]])
np.arange(0, 20, 2)

# Create an array of five values evenly spaced between 0 and 1
## array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
np.linspace(0, 1, 5)

# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
## array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
np.random.random((3, 3))
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
## array([[ 0.92834175,  0.7047232 ,  0.27926941],
##        [ 0.12513209,  0.05822095,  0.36791824],
##        [ 0.11716116,  0.86633976,  0.88031134]])
np.random.normal(0, 1, (3, 3))

# Create a 3x3 array of random integers in the interval [0, 10)
## array([[-0.7355636 ,  0.55281132, -0.82559168],
##        [-0.87349029,  0.16691784,  1.74814013],
##        [ 0.11596561, -0.28291431,  1.463579  ]])
np.random.randint(0, 10, (3, 3))
# Create a 3x3 identity matrix
## array([[8, 7, 6],
##        [8, 0, 5],
##        [3, 5, 2]])
np.eye(3)

# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
## array([[ 1.,  0.,  0.],
##        [ 0.,  1.,  0.],
##        [ 0.,  0.,  1.]])
np.empty(3)
## array([ -3.10503618e+231,  -3.10503618e+231,   1.35441149e-306])

Now we will discuss few categories of basic array manipulation here :

  • Attributes of arrays
  • Indexing of arrays
  • Slicing of arrays
  • Reshaping of arrays
  • Joining and splitting of arrays
import numpy as np
np.random.seed(0) #for reproducibility

x1 = np.random.randint(10, size= (3,4,5))
x1
## array([[[5, 0, 3, 3, 7],
##         [9, 3, 5, 2, 4],
##         [7, 6, 8, 8, 1],
##         [6, 7, 7, 8, 1]],
## 
##        [[5, 9, 8, 9, 4],
##         [3, 0, 3, 5, 0],
##         [2, 3, 8, 1, 3],
##         [3, 3, 7, 0, 1]],
## 
##        [[9, 9, 0, 4, 7],
##         [3, 2, 7, 2, 0],
##         [0, 4, 5, 5, 6],
##         [8, 4, 1, 4, 9]]])

Attributes of arrays

Each array has ndim , shape , size, dtype, itemsize, nbytes attributes.

print(x1.ndim)
## 3
print(x1.shape)
## (3, 4, 5)
print(x1.size)
## 60
print(x1.dtype)
## int64
print(x1.itemsize)
## 8
print(x1.nbytes)
## 480

Indexing of arrays

python is zero indexed unlike R where indexing starts from 1.

x1 = np.array([1,2,3,5,8])
x1[0]
## 1
x1[4]
## 8
x1[-1]
## 8
x1[-2]
## 5

Slicing of arrays

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the slice notation, marked by the colon (:) character.

x[start:stop:step]
x
## 'four'
x = np.arange(10)
x[5:] # elements after index 5
## array([5, 6, 7, 8, 9])
x[4:7] # middle sub-array
## array([4, 5, 6])
x[::2]  # every other element
## array([0, 2, 4, 6, 8])
x[::-1]  # all elements, reversed
## array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
x[5::-2]  # reversed every other from index 5
## array([5, 3, 1])

lets take another example using multi-dimensional sub-arrays.

x2 = np.random.randint(10,size= (3,4))
x2
## array([[8, 1, 1, 7],
##        [9, 9, 3, 6],
##        [7, 2, 0, 3]])
x2[:2, :3]  # two rows, three columns
## array([[8, 1, 1],
##        [9, 9, 3]])
x2[:3, ::2]  # all rows, every other column
## array([[8, 1],
##        [9, 3],
##        [7, 0]])
x2[::-1, ::-1]
## array([[3, 0, 2, 7],
##        [6, 3, 9, 9],
##        [7, 1, 1, 8]])
x2
## array([[8, 1, 1, 7],
##        [9, 9, 3, 6],
##        [7, 2, 0, 3]])

one common needed routine is accessing single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (:)

print(x2[:,0])
## [8 9 7]
print(x2[0,:])
## [8 1 1 7]

👉 In numpy array slices return views rather than copies of array data. While in list slices will be copies.

let’s take an example.

print(x2)
## [[8 1 1 7]
##  [9 9 3 6]
##  [7 2 0 3]]

The above is our 2-dimensional array.

x2_sub = x2[:2,:2]
print(x2_sub)
## [[8 1]
##  [9 9]]

Now if we modify this subarray, we will see that original array is changed.!

x2_sub[0,0]=99
print(x2_sub)
## [[99  1]
##  [ 9  9]]

Observe below :

print(x2)
## [[99  1  1  7]
##  [ 9  9  3  6]
##  [ 7  2  0  3]]

👉 The above property is useful while working with large datasets.

Despite nice features of array views its sometimes useful to instead explicitly copy the data with in an array or a subarray. This can be mostly easily done with copy() method.

x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)
## [[99  1]
##  [ 9  9]]

if we now modify this subarray, the original array is not touched.

x2_sub_copy[0,0] = 42
print(x2_sub_copy)
## [[42  1]
##  [ 9  9]]
print(x2)
## [[99  1  1  7]
##  [ 9  9  3  6]
##  [ 7  2  0  3]]
Reshaping of arrays

lets take an example

grid  = np.arange(1,10).reshape((3,3))
grid
## array([[1, 2, 3],
##        [4, 5, 6],
##        [7, 8, 9]])
x = np.array([1,2,3])
x.reshape((1,3))
## array([[1, 2, 3]])
x[np.newaxis, :]
## array([[1, 2, 3]])
x.reshape
## <built-in method reshape of numpy.ndarray object at 0x7f8bfdf2c320>
x[:, np.newaxis]
## array([[1],
##        [2],
##        [3]])

Array concatenation and splitting

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines np.concatenate, np.vstack, and np.hstack, np.concatenate takes a tuple or list of arrays as its first argument, as we can see here:

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
## array([1, 2, 3, 3, 2, 1])

It can also be used for two-dimensional arrays:

grid = np.array([[1, 2, 3],
                 [4, 5, 6]])
                 

concatenate along the first axis

np.concatenate([grid, grid])
## array([[1, 2, 3],
##        [4, 5, 6],
##        [1, 2, 3],
##        [4, 5, 6]])

concatenate along the second axis (zero-indexed)

np.concatenate([grid, grid], axis=1)
## array([[1, 2, 3, 1, 2, 3],
##        [4, 5, 6, 4, 5, 6]])

For working with arrays of mixed dimensions, it can be clearer to use the np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])
## array([[1, 2, 3],
##        [9, 8, 7],
##        [6, 5, 4]])
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])
## array([[ 9,  8,  7, 99],
##        [ 6,  5,  4, 99]])

Splitting of Arrays

Opposite to concatenation we have splitting which is implemented by the functions np.split , np.hsplit and np.vsplit.

For each of these, we can pass a list of indices giving the splits points :

x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)
## (array([1, 2, 3]), array([99, 99]), array([3, 2, 1]))
grid = np.arange(16).reshape((4, 4))
grid
## array([[ 0,  1,  2,  3],
##        [ 4,  5,  6,  7],
##        [ 8,  9, 10, 11],
##        [12, 13, 14, 15]])
upper, lower = np.vsplit(grid, [2])
print(upper)
## [[0 1 2 3]
##  [4 5 6 7]]
print(lower)
## [[ 8  9 10 11]
##  [12 13 14 15]]
left, right = np.hsplit(grid, [2])
print(left)
## [[ 0  1]
##  [ 4  5]
##  [ 8  9]
##  [12 13]]
print(right)
## [[ 2  3]
##  [ 6  7]
##  [10 11]
##  [14 15]]

List Vs Array

#Lets consider a list

L  = [1,2,3]

# Lets consider an array
A  = np.array([1,2,3])

Lets print each element of List



for e in L:
    print(e)
## 1
## 2
## 3

Lets print each element of Array

for e in A:
    print(e)
## 1
## 2
## 3

We don’t see any difference here. Let’s append some value in both list

L.append(4)
L
## [1, 2, 3, 4]

now let’s see what happens when we append value to numpy array.

A.append(4)

So there is no method that append in numpy array.

There is another method to append value to list.

L = L + [5]
L
## [1, 2, 3, 4, 5]

Lets try with numpy array

A = A + [4, 5]
A

It also doesn’t work.

Now let’s try vector addition. To make simple we will add vector to itself.

L2 =[]
for e in L:
    L2.append(e + e)
L2
## [2, 4, 6, 8, 10]

How to do in numpy?

A + A
## array([2, 4, 6])

It exactly did what we wanted.

+ sign in list does concatenation however with numpy array it does vector addition. If you have matrix which is 2 dimensional array it will do matrix addition.

Lets do another operation on list and numpy array.

2*A
## array([2, 4, 6])

lets try with list

2*L
## [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

If you want to multiply every element of list you have to use for loop.

Now let’s try element wise squaring

L2 =[]
for e in L:
    L2.append(e*e)
    
L2
## [1, 4, 9, 16, 25]

L**2 will give error. Try urself.

Lets try with numpy

A**2
## array([1, 4, 9])

In numpy most function work element wise.

Some operation on numpy array

np.sqrt(A)
## array([ 1.        ,  1.41421356,  1.73205081])
np.log(A)
## array([ 0.        ,  0.69314718,  1.09861229])
np.exp(A)
## array([  2.71828183,   7.3890561 ,  20.08553692])

For doing above operation on list you need to use for loop and individually apply the operation.

numpy array is more convenient in representing the vector as comparison to list. As to do operation on list we need to use for loop. You will see now how for loop in python is very slow.

Dot Product

Dot product is type of multiplication you can perform on vectors. There are two definition of dot product.

  1. ab = a^T b

  2. ab = |a||b|

a = np.array([1,2])
b = np.array([3,4])
dot = 0
for e, f in zip(a,b):
    dot += e*f
dot
## 11

another interesting operation numpy array is to multiply array together.

a*b
## array([3, 8])

So this gives us element wise multiplication of two arrays. Now we need to sum everything together.

np.sum(a*b)
## 11

Sum function is instance method of numpy array itself.

so an alternative way is

(a*b).sum()
## 11

There is more convenient way in numpy

np.dot(a,b)
#or
## 11
a.dot(b)
## 11

lets calculate the cosine angle using two available formula.

# Calculating the magnitude `a` using simple method

amag = np.sqrt((a*a).sum())
amag
## 2.2360679774997898

numpy has function for the same it is part of linalg module of numpy.

amag = np.linalg.norm(a)
amag
## 2.2360679774997898

We get the same answer.

Lets calculate the cosine angle :-

cosangle = a.dot(b)/(np.linalg.norm(a)* np.linalg.norm(b))
cosangle
## 0.98386991009990743
angle = np.arccos(cosangle)
angle
## 0.17985349979247847