double boolean indexing in Python?
转自 stackoverflow
>>> from numpy import array, arange
>>> a = arange(12).reshape(3,4)
>>> b1 = array([False,True,True]) # first dim selection
>>> b2 = array([True,False,True,False]) # second dim selection
>>>
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[b1] # same thing
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>>
>>> a[b1,b2] # a weird thing to do
array([ 4, 10])
I expected:
array([[ 4, 6],
[ 8, 10]])
Do you have any explanation why it is the case?
A:
Let's start with your array:
a = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Your current indexing logic equates to the following:
a[[1, 2], [0, 2]] # array([ 4, 10])
Sticking to 2 dimensions, NumPy interprets this as indexing dim1-indices [1, 2]
and dim2-indices [0, 2]
, or coordinates (1, 0)
and (2, 2)
. There's no broadcasting involved here.
To permit broadcasting with Boolean arrays, you can use numpy.ix_
:
res = a[np.ix_(b1, b2)]
print(res)
array([[ 4, 6],
[ 8, 10]])
The magic ix_
performs is noted in the docs: "Boolean sequences will be interpreted as boolean masks for the corresponding dimension (equivalent to passing in np.nonzero(boolean_sequence)
)."
print(np.ix_(b1, b2))
(array([[1],
[2]], dtype=int64), array([[0, 2]], dtype=int64))