一、Numpy

导入


import numpy as py

数组的生成

一维和二维数组 np.array


xxxxxxxxxx
# 一维数组
arr=np.array([1,3,5,7])
[1,3,5,7]
# 二维数组
arr = np.array([[1,3,5,7],[2,4,6,8]])
[[1 3 5 7]
 [2 4 6 8]]

指定范围内的数组 np.arange


xxxxxxxxxx
# 生成指定范围内的数组（一维）
arr = np.arange(1,10,2)  # 左闭右开，步长为2，如果省略步长，则默认步长为1
[1 3 5 7 9]
arr=np.arange(10)        # 左闭右开，默认步长为1，默认从0开始
[0 1 2 3 4 5 6 7 8 9]

全为0（零）的数组 np.zeros


xxxxxxxxxx
arr= np.zeros(3)        # 一维
[0. 0. 0.]
arr= np.zeros((3,2))    # 二维
[[0. 0.]
 [0. 0.]
 [0. 0.]]

全为1的数组 np.ones


xxxxxxxxxx
arr= np.ones(3)     # 一维
[1. 1. 1.]
arr= np.ones((3,2)) # 二维,输入元组
[[1. 1.]
 [1. 1.]
 [1. 1.]]

单位矩阵（二维） np.eye

对角线的值全为1，其余的全为0（零）。


xxxxxxxxxx
arr= np.eye(3)
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

随机数、数组

随机数据生成主要用到的是Numpy中的random模块。

值在[0,1)之间均匀分布 np.random.rand()


xxxxxxxxxx
arr = np.random.rand(3)   # 一维
[0.8030708  0.78286303 0.50264369]
arr = np.random.rand(3,2)   # 二维
[[0.66689524 0.84930252]
 [0.38527899 0.54416742]
 [0.77398462 0.93349523]]

np.random.random(size = (2,2)) 等价于 np.random.random(size = (2,2))


xxxxxxxxxx
np.random.random(size = (2,2))
[[ 0.25303772   0.45417512]
 [ 0.76053763   0.12454433]]

标准正态分布 np.random.randn()

以0为均值、以1为标准差的正态分布(mean=0, stdev=1)，，记为N（0，1）


xxxxxxxxxx
arr = np.random.randn(5)   # 一维
[-0.47447338 -1.20679413  1.44442981 -1.00593146  0.68298953]
arr = np.random.randn(3,5)   # 二维
[[0.05393455 0.96485719 0.81191989 0.472052   0.85951346]
 [0.14284396 0.1326059  0.56438413 0.75216161 0.72776451]
 [0.56635746 0.76994301 0.39844886 0.25972326 0.44945312]]

随机整数 np.random.randint()

参数


xxxxxxxxxx
np.random.randint(low, high=None, size=None, dtype=int)

如果只有low，那范围就是[0,low)。
如果有high，范围就是[low,high) ，生成大小为sizer的均匀分布的整数。
size是一个值，表示生成一维；size是一个对值（元组形式），表示生成多维数组。

实例


xxxxxxxxxx
arr=np.random.randint(1,10,10)      # 在区间[1,10)生成10个随机数
[4 1 2 8 2 6 4 7 2 6]
arr=np.random.randint(10,size=10)   # 在区间[0,10)生成10个随机数
[7 2 5 9 5 8 2 6 8 3]
arr=np.random.randint(10,size=(2,3))  # 在区间[0,10)生成随机数二维（2，3）
[[2 3 7]
 [5 8 3]]
np.random.randint(2, size=10)
[1, 0, 0, 0, 1, 1, 0, 0, 1, 0]

随机抽取 np.random.choice()

参数


xxxxxxxxxx
np.random.choice(a, size=None, replace=True, p=None)

从数列a中随机选择size个元素，replace为True表示选出的元素允许重复。p为元素被选中的概率数列
a可以是一个数组，也可以是一个整数。
- 如果a是一个数组时，从该数组中随机采样；
- 如果a为整数时，当a为整数时，随机选择数组np.arange(a)中的数。

实例


x
arr=np.random.choice([1,2,3,4,5],3)     # 从数组[1,2,3,4,5]中随机取出3个
[4 1 2]
arr=np.random.choice([1,2,3,4,5],(2,3)) # 从数组[1,2,3,4,5]中随机取出二维数组(2,3)
[[1 2 4]
 [1 3 2]]

arr=np.random.choice(5,3)       # 从[0,5)中随机取出3个
[0,4,2]
arr=np.random.choice(5,(2,3))   # 从[0,5)中随机取出二维数组(2,3)
[[2 1 2]
 [0 1 3]]

打乱原数组顺序 np.random.shufflie()

根据数组a的第1轴（也就是最外层的维度）进行随排列，改变数组x


xxxxxxxxxx
arr = [1,2,3,4,5,6,7,8,9]
# arr = np.random.shuffle(arr)  # 错误写法，返回None
np.random.shuffle(arr)          # 正确写法
[4, 9, 6, 5, 3, 2, 7, 1, 8]

打乱数组顺序生成新的 np.random.permutation()

根据数组a的第1轴产生一个新的乱序数组，不改变数组x


xxxxxxxxxx
arr=np.array([1,2,3,4,5])
arrNew = np.random.permutation(arr)
print(arr)
[1,2,3,4,5]
print(arrNew)
[2 1 5 4 3]

均匀分布 np.random.uniform()


xxxxxxxxxx
numpy.random.uniform(low,high,size)

从一个均匀分布[low,high)中随机采样，定义域是左闭右开。
low 默认值为 0， high 默认值为 1 ，size 缺省时输出1个值


xxxxxxxxxx
np.random.uniform(1,10,5)
[2.28565255, 4.72959013, 2.84071081, 1.84718753, 9.08506286]
np.random.uniform(1,10)
4.3275659408551235

随机种子生成器 np.random.seed(s)

对于同一个s，生成的随机数相同

正态（高斯）分布 np.random.normal()

产生具有正态分布的数组,loc均值,scale标准差,size形状


xxxxxxxxxx
np.random.normal([loc, scale, size])

标准正态分布 np.random.standard_normal()


xxxxxxxxxx
np.random.standard_normal([size])

基本属性

形状 shape


xxxxxxxxxx
arr= np.array([1,2,3])
(3,)
arr= np.array([[1,2,3],[4,5,6]])
arr.shape
(2, 3)

元素个数（大小） size


xxxxxxxxxx
arr= np.array([[1,2,3],[4,5,6]])
arr.size
6

维数 ndim


xxxxxxxxxx
arr= np.array([[1,2,3],[4,5,6]])
arr.ndim
2

索引切片 [:,:]

Numpy中数组的元素位置的索引是从0开始的。

一维数组

如果要从末尾开始取值，传入负数。例如-1表示末尾最后一个。


xxxxxxxxxx
arr = np.array([1,2,3,4,5,6,7,8,9])
arr[0]  # 返回1
1
# 获取末尾最后一个数
arr[-1] # 返回9
9
# 获取位置3到6的值，不包含6
arr[3:6]
[4 5 6]
# 获取位置3以后所有元素
arr[3:]
[4 5 6 7 8 9]
# 获取从第3位置到倒数第2位的元素，不包含倒数第2位
arr[3:-2]
[4 5 6 7]
# 传入某个条件
arr[arr>3]
[4, 5, 6, 7, 8, 9]

二维数组

行，列的位置索引都是从0开始


xxxxxxxxxx
arr= np.array([[1,2,3],[4,5,6],[7,8,9]])
[[1 2 3]
 [4 5 6]
 [7 8 9]]
 
# 取一个
arr[1,1]
5
# 取一行
arr[1]   # 第2行
[4, 5, 6]
# 取一列
arr[:,1]   # 第2列
[2, 5, 8]
# 取第1到3列，不包含第3例
arr[:,0:2]
[[1, 2],
 [4, 5],
 [7, 8]]
 # 取第2行之后，第2列之后
 arr[1:,1:]
 [[5, 6],
  [8, 9]]

数值的类型 dtype

int 整形数，即整数
float 浮点数，即小数点
object Python对象类型
string_ 字符串类型，经常用S表示，S10表示长度为10的字符串。

unicode_ 固定长度的unicode类型，常用 U表示。


xxxxxxxxxx
arr= np.array([[1,2,3],[4,5,6]])
arr.dtype
dtype('int32')

类型转换 astype()


xxxxxxxxxx
arr1 = np.arange(10)
[0 1 2 3 4 5 6 7 8 9]
arr1.dtype
dtype('int32')
# 将arr1数组从int类型转换为float类型
arr2=arr.astype(np.float)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
arr2.dtype
dtype('float64')

数据预处理

缺失值 np.isnan()

分两步：第一步判断是否含有缺失值，第二步，缺失值的填充。

查找缺失值的方法是isnan()函数，在Numpy中缺失值用np.nan表示


xxxxxxxxxx
arr=np.array([1,2,3,np.nan,5,6])
[ 1.  2.  3. nan  5.  6.]
# 判断是否缺失值
np.isnan(arr)
[False, False, False,  True, False, False]
# 缺失值填充0
arr[np.isnan(arr)]=0
[1. 2. 3. 0. 5. 6.]

重复值 unique()


xxxxxxxxxx
arr=np.array([1,2,3,4,3,4])
arr=np.unique(arr)
[1 2 3 4]

变换形状 reshape()

就是更改数组的形状。例如（4，3）变成（3，4）
注意转换数组元素的个数要一样的。


xxxxxxxxxx
arr1= np.arange(10)
[0 1 2 3 4 5 6 7 8 9]
arr2 = arr.reshape(2,5)
[[0 1 2 3 4]
 [5 6 7 8 9]]

转置 T

将数组的行转为列，列转为行。


xxxxxxxxxx
arr1= np.array([[0, 1, 2, 3],[ 6, 7, 8, 9]])
[[1 2 3]
 [7 8 9]]
arr2 = arr1.T
[[1 7]
 [2 8]
 [3 9]]

降维 flatten( )

flatten是numpy.ndarray.flatten的一个函数，即返回一个一维数组。
flatten只能适用于numpy对象，即array或者mat，普通的list列表不适用！
a.flatten()：a是个数组，a.flatten()就是把a降到一维，默认是按行的方向降。
a.flatten().A：a是个矩阵，降维后还是个矩阵，矩阵.A（等效于矩阵.getA()）变成了数组。
'C' -- 按行，'F' -- 按列，'A' -- 原顺序，'K' -- 元素在内存中的出现顺序。

用于array（数组）对象


xxxxxxxxxx
arr=array([[1,2],[3,4],[5,6]])
[[1 2]
 [3 4]
 [5 6]]
 arr2 = arr.flatten()   # 默认按行的方向降维
 arr2 = arr.flatten("A")    # 按行降维
 array([1 2 3 4 5 6])
 arr.flatten('F')   #按列降维
 array([1, 3, 5, 2, 4, 6])

用于mat（矩阵）对象


xxxxxxxxxx
arr = np.mat([[1,2,3],[4,5,6]])
matrix([[1, 2, 3, 4, 5, 6]])
arr.flatten()
matrix([[1, 2, 3, 4, 5, 6]])
arr.flatten().A     # 将矩阵转化为数组
array([[1, 2, 3, 4, 5, 6]])

该方法不能用于list对象，请使用for遍历


xxxxxxxxxx
arr=[[1,2],[3,4],[5,6]]
[y for x in arr for y in x]
[1, 2, 3, 4, 5, 6]

横向合并

横向合并就是将两个行数相等的数组在行方向上进行简单拼接。与DateFrame合并不一样，Numpy数组合并不需要公共列，只是将两个数组简单的拼接在一起，有三种方法：concatanate、hstck、column_stack


xxxxxxxxxx
arr1= np.array([[1,2,3],[4,5,6]])
arr1= np.array([[7,8,9],[10,11,12]])

concatanate


xxxxxxxxxx
# axis=1表示在行方向上进行合并
arr=np.concatenate([arr1,arr2],axis=1)
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

hstck


xxxxxxxxxx
arr=np.hstack((arr1,arr2))  # 传参以元组形式
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

column_stack


xxxxxxxxxx
arr=np.column_stack((arr1,arr2))    # 传参以元组形式
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]

纵向合并

纵向合并与横向合并类似，也有三种方法：concatanate、vstack、row_stack

concatanate


xxxxxxxxxx
arr=np.concatenate([arr1,arr2],axis=0)
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

vstack


xxxxxxxxxx
arr=np.vstack((arr1,arr2))
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

row_stack


xxxxxxxxxx
arr=np.row_stack((arr1,arr2))
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

数组计算

简单算术

NumPy 算术函数包含简单的加减乘除: add()，subtract()，multiply() 和 divide()。

需要注意的是数组必须具有相同的形状或符合数组广播规则。

叉乘（外积）

1. 定义：

概括地说，两个向量的外积，又叫叉乘、叉积向量积，其运算结果是一个向量而不是一个标量。并且两个向量的外积与这两个向量组成的坐标平面垂直。

2. 几何意义：

（1）在三维几何中，向量a和向量b的外积结果是一个向量，有个更通俗易懂的叫法是法向量，该向量垂直于a和b向量构成的平面。

（2）在二维空间中，外积还有另外一个几何意义就是：|a×b|在数值上等于由向量a和向量b构成的平行四边形的面积。

例1：


xxxxxxxxxx
a = [1 2]
b = [3 4]
a*b = [3 8]

例2:


xxxxxxxxxx
# 广播机制
a = np.array([[1,2],[5,6]])
b = np.array([10,10])   # 广播为[[10,10],[10,10]]
arr = np.multiply(a,b)  # 另一种写法arr = a*b
[[10 20]
 [50 60]]

线性计算

函数	描述
`dot`	两个数组的点积，即元素对应相乘。(2,3)X(3,2)=(2,2)
`vdot`	两个向量的点积
`inner`	两个数组的内积
`matmul`	两个数组的矩阵积
`determinant`	数组的行列式
`solve`	求解线性矩阵方程
`inv`	计算矩阵的乘法逆矩阵

点乘（内积） np.dot()

定义：

概括地说，向量的内积（点乘/数量积）。对两个向量执行点乘运算，就是对这两个向量对应位一一相乘之后求和的操作。

几何意义：
（1）表征或计算两个向量之间的夹角
（2）b向量在a向量方向上的投影
（3）公式 a•b = |a||b|cosθ

一维数组的点积就是计算两个数组对应位置乘积之和。计算公式为a[0] * b[0] +a[1] * b[1 +...+a[n]*b[n]
二维数组的点积相对复杂一些，假如有数组A和数组B他们都是两行两列，则点积计算结果也为两行两列的一个数组，假设点积数组为C，计算公式如下。
C[0,0]=A[0,0] *B[0,0] + A[0,1] *B[1,0]：A的第一行与B的第一列，对应元素的乘积之和;
C[0,1]=A[0,0] *B[0,1] + A[0,1] *B[1,1]：A的第一行与B的第二列，对应元素的乘积之和;
C[1,0]=A[1,0] *B[0,0] + A[1,1] *B[1,0]：A的第二行与B的第一列，对应元素的乘积之和;
C[1,1]=A[1,1] *B[0,1] + A[1,1] *B[1,1]：A的第二行与B的第二列，对应元素的乘积之和;

例1


xxxxxxxxxx
# 一维
a = np.array([1,2,3])
b = np.array([5,6,7])
arr = np.dot(a,b)
38  # 38=1*5+2*6+3*7

例2


xxxxxxxxxx
# 二维
a = np.array([[1,2],[3,4]])
b = np.array([[11,12],[13,14]])
print(np.dot(a,b))
[[37  40] 
 [85  92]]
#计算过程 [[1*11+2*13, 1*12+2*14],[3*11+4*13, 3*12+4*14]]

元素级函数

主要是针对数组中的每一个元素执行相同的操作。

abs 绝对值
sqrt 平方根
square 平方
exp 以e为底的指数
log、log10、log2、logp 以e、10、2、1+x为底的对数
modf 适用于浮点数，将小数和整数部分以独立的数组返回
isnan 判断是否NaN，返回布尔值


xxxxxxxxxx
arr=np.arange(10)
np.square(arr) # 平方
array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)
np.sqrt(arr)    # 平方根 
array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

统计类函数

sum 求和，axis=1行，axis=0列
mean 平均值
std / var 标准差、方差
min / max 最小值、最大值
argmin / argmax 最小值索引、最大值索引
cumsum 累计求和，返回结果为数组
sumprod 累计积
maximum 逐位比较取其大者


xxxxxxxxxx
arr= np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr)
[[1 2 3]
 [4 5 6]
 [7 8 9]]
# 整个数组求和 
np.sum(arr) 
45
# 对数组的每一行求和
np.sum(arr,axis=1)
[ 6, 15, 24]
# 对数组的每一列求和
np.sum(arr,axis=0)
[12, 15, 18]
# 对数组的每一行求均值
np.mean(arr,axis=1)
[2., 5., 8.]
# 对数组的每一列求均值
np.mean(arr,axis=0)
[4., 5., 6.]
# np.maximum：(X, Y, out=None)函数
np.maximum([-2, -1, 0, 1, 2], 0)
array([0, 0, 0, 1, 2])

条件函数 np.where()

参数：np.where(条件，真，假) ，类似Excel中的if函数


xxxxxxxxxx
arr = np.array([55,66,77])
np.where(arr>60,"合格","不合格")
['不合格', '合格', '合格'], dtype='<U3'
np.where(arr>60）
array([1, 2], dtype=int64)

集合关系

主要有包含、交集、并集、差集四种。


xxxxxxxxxx
arr1=np.array([1,2,3,4])
arr2=np.array([1,2,5])

# 包含
# np.in1d(arr1,arr2)
[ True,  True, False, False]
# 交集
np.intersect1d(arr1,arr2)
[1, 2]
# 并集
np.union1d(arr1,arr2)
[1, 2, 3, 4, 5]
# 差集
np.setdiff1d(arr1,arr2)
[3, 4]

一、Numpy

导入

数组的生成

一维和二维数组 np.array

指定范围内的数组 np.arange

全为0（零）的数组 np.zeros

全为1的数组 np.ones

单位矩阵（二维） np.eye

随机数、数组

值在[0,1)之间均匀分布 np.random.rand()

标准正态分布 np.random.randn()

随机整数 np.random.randint()

参数

实例

随机抽取 np.random.choice()

参数

实例

打乱原数组顺序 np.random.shufflie()

打乱数组顺序生成新的 np.random.permutation()

均匀分布 np.random.uniform()

随机种子生成器 np.random.seed(s)

正态（高斯）分布 np.random.normal()

标准正态分布 np.random.standard_normal()

基本属性

形状 shape

元素个数（大小） size

维数 ndim

索引切片 [:,:]

一维数组

二维数组

数值的类型 dtype

类型转换 astype()

数据预处理

缺失值 np.isnan()

重复值 unique()

变换形状 reshape()

转置 T

降维 flatten( )

用于array（数组）对象

用于mat（矩阵）对象

该方法不能用于list对象，请使用for遍历

横向合并

concatanate

hstck

column_stack

纵向合并

concatanate

vstack

row_stack

数组计算

简单算术

叉乘（外积）

线性计算

点乘（内积） np.dot()

元素级函数

统计类函数

条件函数 np.where()

集合关系

二、Numpy其它