How to use the NumPy concatenate function
This article is originally published at https://www.sharpsightlabs.com
This tutorial will explain how to use the NumPy concatenate function in Python (which is sometimes called np.concatenate).
This post will cover several topics. If you don’t want to read the full tutorial, click on the appropriate link and it will send you to the relevant section of this tutorial.
This post will cover:
- What the NumPy concatenate function does
- The syntax of NumPy concatenate
- Examples of how to use NumPy concatenate
First, I’ll start by explaining what the concatenate function does.
NumPy concatenate joins together numpy arrays
So what is the concatenate function?
The NumPy concatenate function is function from the NumPy package. NumPy (if you’re not familiar), is a data manipulation package in the Python programming language. We use NumPy to “wrangle” numeric data in Python.
NumPy concatenate essentially combines together multiple NumPy arrays.
There are a couple of things to keep in mind.
First, NumPy concatenate isn’t exactly like a traditional database join. It’s more like stacking NumPy arrays.
Second, the concatenate function can operate both vertically and horizontally. You can concatenate arrays together vertically (like in the image above), or you can concatenate arrays together horizontally.
Later in the examples section, I’ll show you how to use concatenate both ways.
Before we discuss concrete examples though, let’s quickly look at the syntax of the np.concatenate function.
The syntax of numpy concatenate
The syntax of NumPy concatenate is fairly straightforward, particularly if you’re familiar with other NumPy functions.
Syntactically, there are a few main parts of the function: the name of the function, and several parameters inside of the function that we can manipulate.
In Python code, the concatenate function is typically written as
np.concatenate(), although you might also see it written as
numpy.concatenate(). Either case assumes that you’ve imported the NumPy package with the code
import numpy as np or
import numpy, respectively.
Moving forward, this tutorial will assume that you’ve imported NumPy by executing the code
import numpy as np.
The parameters and arguments of numpy concatenate
There are a few parameters and arguments of the np.concatenate function:
- a sequence of input arrays (the arrays that you will concatenate together)
Let’s take a look at each of these separately.
The input arrays
When you use the np.concatenate function, you need to provide at least two input arrays.
There are a few important points that you should know about the input arrays for np.concatenate.
The input arrays should be provided in a Python sequence
Notice that the arrays –
arr2 in the above example – are enclosed inside of parenthesis. Because they are enclosed in parenthesis, they are essentially being passed to the concatenate function as a Python tuple. Alternatively, you could enclose them inside of brackets (i.e.,
[arr1, arr2]), which would pass them to concatenate as a Python list.
Either method is acceptable: you can provide the input arrays in a list or a tuple. What’s important to understand is that you need to provide the input arrays to the concatenate function within some type of Python sequence. Tuples and lists are both types of Python sequences.
If you’re a little confused about this, I suggest that you review Python sequences.
The input arrays should often be the same data type
Another point that I’ll make is that the input arrays should probably contain data of the same data type.
But keep in mind that the data types probably should be the same, but they don’t have to be.
The issue here is that, if the input arrays that you give to NumPy concatenate have different datatypes, then the function will try to re-cast the data of one array to the data type of the other.
For example, let’s say that you create two NumPy arrays and pass them to np.concatenate. One NumPy array contains integers, and one array contains floats.
integer_data = np.array([[1,1,1],[1,1,1]], dtype = 'int') float_data = np.array([[9,9,9],[9,9,9]], dtype = 'float') np.concatenate([integer_data, float_data])
When you run this, you can see that all of the numbers in the output array are floats.
array([[ 1., 1., 1.], [ 1., 1., 1.], [ 9., 9., 9.], [ 9., 9., 9.]])
Why? Some of the inputs were integers, right?
A NumPy array must contain numbers that all have the same data type. If the inputs to np.concatenate have different data types, it will re-cast some of the numbers so that all of the data in the output have the same type. (It appears that NumPy is re-casing the lower precision inputs to the data type of the higher precision inputs. So it is re-casting the integers into floats.)
Ultimately, you need to be careful when working with NumPy arrays that have different data types. The behavior of NumPy concatenate in those cases may have unintended consequences.
You can concatenate together many arrays
In the examples I’ll show later in this tutorial, we’ll mostly work with two arrays. We’ll concatenate together only two.
Keep in mind, however, that it’s possible to concatenate together a large sequence of NumPy arrays. More than two. You can do three, or four, or more.
Having said that, if you’re just getting started with NumPy, I recommend that you learn and practice the syntax with very simple examples. Stick with two arrays in the beginning.
The axis parameter
Now that we’ve talked about the input arrays, let’s talk about how the
np.concatenate() function puts them together.
As I mentioned earlier in this tutorial, the concatenate function can join together arrays vertically or horizontally.
The behavior of np.concatenate – whether it concatenates the numpy arrays vertically or horizontally – depends on the axis parameter.
A quick introduction to NumPy array axes
I have to be honest. One of the hardest things for beginners to understand in NumPy are array axes.
For a variety of reasons, array axes are just hard to understand. The naming conventions (axis 0, axis 1, etc) are a little abstract. And the documentation about axes is not always 100% clear. Ultimately, these factors make array axes a little un-intuitive.
Be that as it may, to understand how to use NumPy concatenate with the axis parameter, you need to understand how NumPy array axes work.
With that in mind, let’s try to shed a little light on array axes.
First, let’s start with the basics. NumPy arrays have what we call axes.
The term “axis” seems to confuse people in the context of NumPy arrays, so let’s take a look at a more familiar example. Take a look at a Cartesian coordinate system.
A Cartesian coordinate system has axes. Specifically, we typically refer to the horizontal axis as the
x axis, and the vertical axis as the
y axis. Almost everyone should be familiar with this.
In Cartesian space, these axes are just directions. Moreover, an observation at a point in a Cartesian space can be defined by its value along each axis. So for example, we can identify a point in a Cartesian space by specifying how many units to travel along the x axis, and how many units to travel along the y axis.
NumPy array axes are the directions along the rows and columns
Axes in a NumPy array are very similar. Axes in a NumPy array are just directions: axis 0 is the direction running vertically down the rows and axis 1 is the direction running horizontally across the columns.
Remember also that in Python, things are indexed starting with “0” (e.g., the first element in a list is actually at index 0). Similarly, the “first” axis in a NumPy array is “axis 0.”
Ultimately though, when we say “axis 0” we’re talking about the direction that points down the rows, and when we say “axis 1” we’re talking about the direction that points across the columns.
And just like in a Cartesian coordinate system, we can use this system of axes to identify particular cells in the dataset. We can identify a particular location in a NumPy array by specifying how many units on the 0-axis and how many units on the 1-axis. It’s very similar to how we identify particular points at locations in an x/y coordinate space.
How we use axes in NumPy concatenate
Now that we’ve talked about axes in general, let’s talk about how they operate with respect to the concatenate function.
Remember what I mentioned earlier in this tutorial: we can concatenate NumPy arrays horizontally or we can concatenate NumPy arrays vertically.
Which one we do is specified by the
If we set
axis = 0, the concatenate function will concatenate the NumPy arrays vertically.
(By the way, this is the default behavior. If you don’t specify the axis, the default behavior will be
axis = 0.)
On the other hand, if we manually set
axis = 1, the concatenate function will concatenate the NumPy arrays horizontally.
Numpy concatenate is like “stacking” numpy arrays
A lot of people still find this to be un-intuitive, so I’ll quickly explain it another way.
The best way to think of NumPy concatenate is to think of it like stacking arrays, either vertically or horizontally.
The axis that we specify with the
axis parameter is the axis along which we stack the arrays.
So when we set
axis = 0, we are stacking along axis 0. Axis 0 is the axis that runs vertically down the rows, so this amounts to stacking the arrays vertically.
Similarly, when we set
axis = 1, we’re stacking along axis 1. Axis 1 is the axis that runs horizontally across the columns, so this amounts to stacking the arrays horizontally.
If this still seems a little confusing, that’s OK.
To help clear things up, we’re going to move on to some concrete examples that you can run yourself. Understanding how np.concatenate works will be easier when you have some real examples that you can play with.
Examples: how to use np.concatenate
Ok, let’s work with some real examples.
Before you start, run this code
Before you get started with these examples, you’ll need to import the NumPy package into your development environment.
You can do that with the import statement as follows:
import numpy as np
This will enable you to refer to NumPy as
np when when you call the concatenate function.
Concatenate two numpy arrays
First, let’s just concatenate together two simple NumPy arrays.
Create numpy arrays
To do this, we’ll first create two NumPy arrays with the np.array function.
np_array_1s = np.array([[1,1,1],[1,1,1]]) np_array_9s = np.array([[9,9,9],[9,9,9]])
Now, let’s print them out:
[[1 1 1] [1 1 1]]
[[9 9 9] [9 9 9]]
Basically, we have two simple NumPy arrays, each with three values.
Concatenate together arrays with np.concatenate
Now, let’s combine them together using NumPy concatenate.
When you run this, it produces the following output:
array([[1, 1, 1], [1, 1, 1], [9, 9, 9], [9, 9, 9]])
Notice what’s happened here. The concatenate function has combined the two arrays together vertically. Essentially, the concatenate function has combined them together and has defaulted to
axis = 0.
Concatenate numpy arrays vertically
Next, we’re going to concatenate the arrays together vertically again, but this time we’re going to do it explicitly with the
In this example, we’re going to reuse the two arrays that we created earlier:
To explicitly concatenate them together vertically, we need to set
axis = 0.
np.concatenate([np_array_1s, np_array_9s], axis = 0)
Which produces the following output:
array([[1, 1, 1], [1, 1, 1], [9, 9, 9], [9, 9, 9]])
Notice that this is the same as if we had used concatenate without specifying the
axis. By default, the np.concatentate function sets
axis = 0.
Concatenate numpy arrays horizontally
Finally, let’s concatenate the two arrays horizontally.
To do this, we need to set
axis = 1.
np.concatenate([np_array_1s, np_array_9s], axis = 1)
Which produces the following output:
array([[1, 1, 1, 9, 9, 9], [1, 1, 1, 9, 9, 9]])
Remember that axis 1 is the axis that runs horizontally across the columns. So when we set
axis = 1, the concatenate function is essentially combining the two arrays in that direction … horizontally.
Be careful concatenating 1-d arrays
Before ending this NumPy concatenate tutorial, I want to give you a quick warning about working with 1 dimensional NumPy arrays.
If you want to concatenate together two 1-dimensional NumPy arrays, things won’t work exactly the way you expect.
Let’s say we have two 1-dimensional arrays:
np_array_1s_1dim = np.array([1,1,1]) np_array_9s_1dim = np.array([9,9,9])
And let’s concatenate them together using
axis = 0:
np.concatenate([np_array_1s_1dim, np_array_9s_1dim], axis = 0)
Here’s the output:
array([1, 1, 1, 9, 9, 9])
Why are they being concatenated together horizontally? If we set
axis = 0, shouldn’t this concatenate them together vertically?
No, not in this case.
This is a little subtle, and it all comes down to axes.
Think about what we have here. Both of the input arrays are one dimensional.
Because they are one dimensional, there is only one axis. Axis 0 is the only axis they have!
Moreover, in the case of a 1-d array, axis 0 actually points along the observations. It points in the direction of the index.
So when we use np.concatenate in this case, it is still concatenating them along axis 0. The issue is that because they are 1-d arrays, axis 0 points horizontally along the observations.
In any event, concatenate function works “fine” in this case, but you need to really understand NumPy axes to understand its behavior.
Concatenating 1-d arrays with axis = 1 causes an error
A related issue is when you try to concatenate together two 1-dimensional NumPy arrays with
axis = 1.
If you try to concatenate together two 1-d NumPy arrays vertically, using
axis = 1, you will get an error.
For example, take a look at the following code:
np.concatenate([np_array_1s_1dim, np_array_9s_1dim], axis = 1)
When you run this, you’ll get an error:
IndexError: axis 1 out of bounds [0, 1)
What’s going on here?
Again, this is a bit subtle, but it makes sense if you think about it.
The input arrays that we’ve used here are one dimensional.
When we use the syntax
axis = 1, we’re asking the concatenate function to concatenate the arrays along the second axis. Remember that in NumPy, the first axis is “axis 0” and the second axis is “axis 1.” The axes are numbered starting from 0 (just like Python indexes).
Here’s the problem though: in a 1-dimensional NumPy array, there is no second axis. In a 1-d array, the only axis is axis 0. There is no second axis (“axis 1”) along which we can concatenate the arrays.
Once again, this is subtle, but it makes sense when you understand how NumPy axes work.
Just be careful, and make sure you think through the structure of your arrays before you use NumPy concatenate.
If you want to learn data science in Python, learn NumPy
NumPy concatenate is only one data manipulation tool in Python’s NumPy package.
If you want to be great at data science in Python, you’ll need to learn more about NumPy.
For more Python data science tutorials, sign up for our email list
More broadly though, if you’re interested in learning and mastering data science in Python, you should sign up for our email list right now.
Here at the Sharp Sight blog, we regularly post tutorials about a variety of data science topics … in particular, about NumPy.
If you sign up for our email list, our Python data science tutorials will be delivered to your inbox.
You’ll get free tutorials on:
- Base Python
- Scikit learn
- Machine learning
- Deep learning
- … and more.
Want to learn data science in Python? Sign up now.
Please visit source website for post related comments.