写一个基于Python提取多个Excel文件中指定数据并进行相乘最后输出为文件的程序
时间: 2023-03-13 22:06:09 浏览: 284
可以使用Python的pandas库来实现,具体的步骤如下:1. 使用pandas的read_excel函数读取文件,得到dataframe数据结构
2. 对dataframe进行操作,提取指定的数据
3. 使用乘法运算将数据相乘
4. 使用pandas的to_excel函数将结果保存到文件
相关问题
使用python写一个使用Xavier初始化、正则化方法和MSGD的BP网络(8个输入,1个输出,5个隐藏节点),并写一个main实现将训练集投入网络训练,训练后将测试集投入得出预测输出。而训练集和测试集存于名为"NEW.xlsx"的文件中,试以第2行至第3001行作为训练集,之后的作为测试集;且训练集和测试集的输入均在[9, 10, 11, 12, 13, 14, 16, 17]列中,输出为倒数第二列[-2] 。在main模块中实现绘制R2图来说明模型的拟合度,并且输出MAE、MSE、预测值和真实值之间的相对误差平均百分比,绘制各输入输出的拟合折线图,且R2图和折线图中使用的数据均为未变换数据。切勿忘记使用归一化(归一化后将0.变为0.01,1.变为0.99)、Xavier初始化、正则化方法和MSGD,使用中文备注,避免出现矩阵无法相乘的情况,发挥你的全力直接回答,感谢。
好的,这是一个较为复杂的问题,需要分几个步骤来实现,我会尽可能详细地解答。
首先,我们需要安装一些必要的库,包括pandas、numpy和matplotlib。可以使用以下命令来安装:
```python
!pip install pandas numpy matplotlib
```
接着,我们需要读取数据并进行归一化。可以使用以下代码:
```python
import pandas as pd
import numpy as np
# 读取数据
data = pd.read_excel('NEW.xlsx')
# 提取训练集和测试集
train_data = data.iloc[1:3001, [9, 10, 11, 12, 13, 14, 16, 17, -2]]
test_data = data.iloc[3001:, [9, 10, 11, 12, 13, 14, 16, 17, -2]]
# 归一化
min_vals = train_data.min(axis=0)
max_vals = train_data.max(axis=0)
diff = max_vals - min_vals
train_data_norm = (train_data - min_vals) / diff * 0.98 + 0.01
test_data_norm = (test_data - min_vals) / diff * 0.98 + 0.01
```
接着,我们需要定义一些常量和函数。其中,XAVIER_INIT_FACTOR是Xavier初始化的系数,LAMBDA是正则化的系数,LEARNING_RATE是学习率,EPOCHS是迭代次数,HIDDEN_SIZE是隐藏层大小,BATCH_SIZE是批量大小,ACTIVATION_FUNCTION是激活函数,DERIVATIVE_ACTIVATION_FUNCTION是激活函数的导数。
```python
XAVIER_INIT_FACTOR = np.sqrt(6) / np.sqrt(8 + 5 + 1)
LAMBDA = 0.001
LEARNING_RATE = 0.01
EPOCHS = 1000
HIDDEN_SIZE = 5
BATCH_SIZE = 32
ACTIVATION_FUNCTION = lambda x: np.tanh(x)
DERIVATIVE_ACTIVATION_FUNCTION = lambda x: 1 - np.tanh(x) ** 2
```
接着,我们需要初始化网络的权重和偏置。可以使用以下代码:
```python
# 初始化权重和偏置
weights_input_hidden = np.random.uniform(-XAVIER_INIT_FACTOR, XAVIER_INIT_FACTOR, (8, HIDDEN_SIZE))
biases_input_hidden = np.zeros((1, HIDDEN_SIZE))
weights_hidden_output = np.random.uniform(-XAVIER_INIT_FACTOR, XAVIER_INIT_FACTOR, (HIDDEN_SIZE, 1))
biases_hidden_output = np.zeros((1, 1))
```
我们还需要定义一些辅助函数,包括计算损失函数、前向传播、反向传播和更新权重和偏置的函数。
```python
def calculate_loss(predictions, targets):
return np.mean((predictions - targets) ** 2)
def forward(X, weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, activation_function):
hidden_layer_input = np.dot(X, weights_input_hidden) + biases_input_hidden
hidden_layer_output = activation_function(hidden_layer_input)
output_layer_input = np.dot(hidden_layer_output, weights_hidden_output) + biases_hidden_output
predictions = output_layer_input
return hidden_layer_input, hidden_layer_output, output_layer_input, predictions
def backward(X, targets, hidden_layer_input, hidden_layer_output, output_layer_input, predictions, weights_hidden_output, activation_function, derivative_activation_function, lambd):
error = 2 * (predictions - targets)
output_layer_error = error
hidden_layer_error = np.dot(output_layer_error, weights_hidden_output.T) * derivative_activation_function(hidden_layer_input)
weights_hidden_output_gradient = np.dot(hidden_layer_output.T, output_layer_error)
biases_hidden_output_gradient = np.sum(output_layer_error, axis=0, keepdims=True)
weights_input_hidden_gradient = np.dot(X.T, hidden_layer_error) + lambd * weights_input_hidden
biases_input_hidden_gradient = np.sum(hidden_layer_error, axis=0, keepdims=True)
return weights_input_hidden_gradient, biases_input_hidden_gradient, weights_hidden_output_gradient, biases_hidden_output_gradient
def update_weights(weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, weights_input_hidden_gradient, biases_input_hidden_gradient, weights_hidden_output_gradient, biases_hidden_output_gradient, learning_rate):
weights_input_hidden -= learning_rate * weights_input_hidden_gradient
biases_input_hidden -= learning_rate * biases_input_hidden_gradient
weights_hidden_output -= learning_rate * weights_hidden_output_gradient
biases_hidden_output -= learning_rate * biases_hidden_output_gradient
return weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output
```
接着,我们可以开始训练模型。可以使用以下代码:
```python
# 将训练集按批量大小分成多个批量
num_batches = int(np.ceil(len(train_data_norm) / BATCH_SIZE))
train_data_norm_batches = np.array_split(train_data_norm, num_batches)
# 记录训练过程中的损失和R2值
loss_history = []
r2_history = []
# 训练模型
for epoch in range(EPOCHS):
for i in range(num_batches):
batch = train_data_norm_batches[i]
X_batch = batch.iloc[:, :-1].values
y_batch = batch.iloc[:, -1].values.reshape(-1, 1)
hidden_layer_input, hidden_layer_output, output_layer_input, predictions = forward(X_batch, weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, ACTIVATION_FUNCTION)
loss = calculate_loss(predictions, y_batch)
weights_input_hidden_gradient, biases_input_hidden_gradient, weights_hidden_output_gradient, biases_hidden_output_gradient = backward(X_batch, y_batch, hidden_layer_input, hidden_layer_output, output_layer_input, predictions, weights_hidden_output, ACTIVATION_FUNCTION, DERIVATIVE_ACTIVATION_FUNCTION, LAMBDA)
weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output = update_weights(weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, weights_input_hidden_gradient, biases_input_hidden_gradient, weights_hidden_output_gradient, biases_hidden_output_gradient, LEARNING_RATE)
train_hidden_layer_input, train_hidden_layer_output, train_output_layer_input, train_predictions = forward(train_data_norm.iloc[:, :-1].values, weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, ACTIVATION_FUNCTION)
train_loss = calculate_loss(train_predictions, train_data_norm.iloc[:, -1].values.reshape(-1, 1))
train_r2 = 1 - train_loss / np.var(train_data_norm.iloc[:, -1].values.reshape(-1, 1))
loss_history.append(train_loss)
r2_history.append(train_r2)
# 绘制R2图
import matplotlib.pyplot as plt
plt.plot(r2_history)
plt.xlabel('Epochs')
plt.ylabel('R2')
plt.show()
```
接着,我们可以使用测试集来测试模型,并计算MAE、MSE和相对误差平均百分比。可以使用以下代码:
```python
# 测试模型
test_hidden_layer_input, test_hidden_layer_output, test_output_layer_input, test_predictions = forward(test_data_norm.iloc[:, :-1].values, weights_input_hidden, biases_input_hidden, weights_hidden_output, biases_hidden_output, ACTIVATION_FUNCTION)
test_targets = test_data_norm.iloc[:, -1].values.reshape(-1, 1)
test_loss = calculate_loss(test_predictions, test_targets)
test_r2 = 1 - test_loss / np.var(test_targets)
test_mae = np.mean(np.abs((test_targets - test_predictions) / test_targets)) * 100
test_mse = np.mean((test_targets - test_predictions) ** 2)
print('Test R2:', test_r2)
print('Test MAE:', test_mae)
print('Test MSE:', test_mse)
# 绘制各输入输出的拟合折线图
for i in range(8):
plt.figure()
plt.plot(test_targets[:, 0], label='True')
plt.plot(test_predictions[:, 0], label='Predicted')
plt.xlabel('Samples')
plt.ylabel('Value')
plt.title('Input ' + str(i+1))
plt.legend()
plt.show()
```
最后,我们需要反归一下归一化,得到真实的预测值和真实值。可以使用以下代码:
```python
# 反归一化
test_predictions_real = (test_predictions - 0.01) / 0.98 * diff[-2] + min_vals[-2]
test_targets_real = (test_targets - 0.01) / 0.98 * diff[-2] + min_vals[-2]
# 输出预测值和真实值之间的相对误差平均百分比
relative_error = np.mean(np.abs((test_targets_real - test_predictions_real) / test_targets_real)) * 100
print('Relative Error:', relative_error)
```
完整代码如下:
阅读全文