根据已经提供的配网设备故障分类需求、特征和训练样本,下面是一个使用 Keras 和 TensorFlow 实现的 DNN (深度神经网络) 模型的 Python 代码。
这个模型将学习根据输入的电量计量特征来预测具体的故障类型(二级分类)。
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import io # Required to read string data as file
# --- 1. 数据加载与准备 ---
# 将您提供的训练样本数据加载到字符串中
# 注意:实际应用中,您可能会从 CSV 文件或其他数据源加载
csv_data = """一级分类,二级分类,计量UAB,计量UBC,计量UCA,计量IA,计量IB,计量IC,计量有功功率,计量无功功率,计量视在功率
短路故障,三相短路,1.17,49.18,6.91,435.17,535.89,485.42,836.19,522.28,595.28
短路故障,两相短路,58.38,82.43,3.12,422.23,389.95,349.98,526.23,565.71,676.45
短路故障,相间短路,79.06,77.85,72.74,512.79,538.00,287.32,735.30,529.08,895.38
短路故障,金属性短路,4.32,2.43,6.50,505.31,794.61,650.97,915.57,746.25,859.77
短路故障,电弧性短路,40.70,41.81,1.90,567.84,550.01,331.07,587.50,968.54,837.54
接地故障,单相接地,154.25,182.66,123.46,29.74,18.61,54.32,327.84,481.66,449.70
接地故障,低阻接地,34.24,40.78,43.25,415.27,423.49,405.31,703.53,602.88,807.60
接地故障,高阻接地,223.88,222.49,226.39,58.61,64.30,98.86,192.51,140.06,186.78
接地故障,间歇性接地,166.15,211.60,157.40,85.37,55.40,129.19,235.33,220.35,309.79
接地故障,中性点接地故障,31.43,41.89,33.89,2.65,24.16,13.50,0.00,0.00,0.00
失压故障,全电压失压,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
失压故障,部分失压,19.09,4.95,5.35,3.68,17.97,82.89,229.89,33.05,55.73
失压故障,电压骤降,79.10,76.87,70.34,197.59,149.91,290.81,340.54,532.50,525.93
失压故障,欠电压,175.43,179.97,172.98,102.94,133.96,74.76,326.44,464.43,234.45
失压故障,电压不平衡,177.70,178.86,147.53,64.16,138.45,120.85,394.81,420.57,263.45
过载故障,持续过载,226.38,205.75,204.42,282.89,213.22,263.08,1457.89,1292.88,1429.80
过载故障,短时过载,238.55,231.57,224.44,474.48,216.65,497.16,1457.95,923.04,1492.04
过载故障,变压器过载,204.60,228.96,212.34,498.62,459.74,453.83,1645.32,1754.39,1442.38
过载故障,线路过载,235.75,232.73,204.00,354.66,202.82,334.05,857.01,831.72,1465.23
过载故障,开关设备过载,226.65,227.94,221.35,155.53,283.83,216.01,730.80,821.15,964.58
断路故障,导线断线,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
断路故障,接头断开,21.83,18.10,20.27,24.15,0.10,27.77,111.37,43.63,102.05
断路故障,开关拒动,6.43,14.81,25.93,62.66,86.10,37.95,276.33,224.14,169.41
短路故障,三相短路,48.40,7.62,19.85,500.72,550.51,307.39,759.50,731.22,988.58
短路故障,两相短路,93.87,20.22,96.03,336.24,214.66,407.20,789.35,645.87,491.65
短路故障,相间短路,90.24,90.05,82.57,565.06,310.05,506.56,401.89,876.39,469.79
短路故障,金属性短路,7.42,0.94,2.06,992.94,764.69,675.77,1174.22,887.31,1163.59
短路故障,电弧性短路,16.97,42.62,7.84,682.36,547.54,680.82,594.98,637.95,979.76
接地故障,单相接地,173.20,162.17,153.83,55.11,75.21,45.69,510.81,595.26,303.61
接地故障,低阻接地,41.23,8.44,31.28,656.30,609.88,659.47,936.20,641.52,770.84
接地故障,高阻接地,221.97,239.04,221.12,52.87,87.38,73.55,162.70,288.05,101.32
接地故障,间歇性接地,204.87,212.98,199.20,54.06,119.21,116.59,284.96,308.95,266.22
接地故障,中性点接地故障,22.12,3.41,1.00,7.56,35.14,5.66,0.00,0.00,0.00
失压故障,全电压失压,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
失压故障,部分失压,43.33,23.56,7.92,53.64,69.80,99.17,31.33,11.80,81.19
失压故障,电压骤降,50.29,59.14,92.31,226.41,251.32,207.96,531.53,685.82,654.84
失压故障,欠电压,114.54,126.33,102.75,94.57,114.20,58.64,494.71,200.64,409.75
失压故障,电压不平衡,159.74,161.99,192.40,147.01,100.77,123.09,442.85,226.73,284.58
过载故障,持续过载,218.62,210.10,230.07,261.81,322.38,245.50,1480.19,1495.48,1064.44
过载故障,短时过载,229.99,222.45,221.84,455.41,200.05,254.28,1147.81,1478.67,1334.01
过载故障,变压器过载,224.67,213.34,203.70,307.84,444.85,548.82,1736.45,1711.38,1549.25
过载故障,线路过载,229.67,208.08,215.55,437.43,427.96,304.08,1341.47,984.54,1075.26
过载故障,开关设备过载,223.75,221.50,214.37,241.42,276.64,106.77,395.41,477.79,727.06
断路故障,导线断线,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
断路故障,接头断开,48.58,11.71,21.31,13.65,35.71,1.84,184.55,93.07,136.94
断路故障,开关拒动,18.45,45.45,2.03,53.55,19.13,13.48,123.44,174.87,136.36
短路故障,三相短路,46.68,20.34,26.29,466.02,484.27,475.84,874.96,860.56,613.29
短路故障,两相短路,48.51,44.13,64.53,466.07,330.51,499.30,442.47,635.64,670.00
短路故障,相间短路,57.31,79.93,75.90,207.28,526.43,313.21,818.51,782.86,430.30
短路故障,金属性短路,4.96,8.68,5.15,980.44,671.55,924.86,818.60,994.41,1081.50
短路故障,电弧性短路,23.22,29.38,6.76,438.98,389.29,329.20,740.70,845.74,662.92
接地故障,单相接地,147.33,137.34,157.72,30.07,25.04,75.52,537.09,593.95,324.78
接地故障,低阻接地,25.06,21.46,17.66,617.67,447.08,480.79,501.48,637.69,748.16
接地故障,高阻接地,200.33,206.31,232.04,96.41,75.88,61.52,132.71,298.01,143.08
接地故障,间歇性接地,174.32,182.25,158.64,58.68,61.25,140.98,328.44,200.25,261.17
接地故障,中性点接地故障,15.23,16.83,46.24,43.81,8.64,6.17,0.00,0.00,0.00
失压故障,全电压失压,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
失压故障,部分失压,13.14,8.29,5.58,30.02,58.68,40.56,293.87,16.95,279.63
失压故障,电压骤降,82.84,59.57,71.84,170.01,296.31,124.12,696.42,321.92,568.73
失压故障,欠电压,133.43,121.64,142.27,106.50,137.04,67.62,344.26,248.48,354.19
失压故障,电压不平衡,211.14,219.80,124.39,82.40,83.94,125.57,411.83,230.18,515.73
过载故障,持续过载,209.72,229.73,215.37,220.46,210.02,262.73,1169.64,995.29,1365.40
过载故障,短时过载,224.35,223.27,201.77,203.41,302.83,239.78,755.54,1178.80,984.40
过载故障,变压器过载,217.56,216.27,204.38,511.18,435.27,312.82,1162.23,1912.85,1913.47
过载故障,线路过载,217.97,232.38,219.96,324.91,308.95,485.10,987.45,1406.61,1352.23
过载故障,开关设备过载,216.74,200.99,209.37,100.94,264.31,230.47,965.71,930.66,394.62
断路故障,导线断线,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
断路故障,接头断开,28.32,40.11,35.41,42.35,39.99,30.29,125.85,168.02,224.99
断路故障,开关拒动,24.80,20.79,22.28,61.17,33.10,44.78,100.60,233.39,270.01
短路故障,三相短路,3.29,2.74,9.92,465.34,541.93,586.90,788.92,825.75,817.71
短路故障,两相短路,2.82,39.88,58.01,211.31,275.82,402.57,726.19,471.35,413.49
短路故障,相间短路,95.96,74.38,55.86,212.29,423.80,438.24,836.72,839.61,806.86
短路故障,金属性短路,0.28,2.30,4.12,666.70,938.97,581.72,1127.70,1072.32,776.94
短路故障,电弧性短路,43.33,27.52,6.95,448.12,658.23,691.04,888.29,536.44,700.50
接地故障,单相接地,187.65,141.79,186.96,37.82,60.68,53.56,339.36,362.99,460.08
接地故障,低阻接地,27.34,40.55,24.40,546.53,413.92,774.59,856.29,825.24,751.49
接地故障,高阻接地,218.58,219.74,226.64,89.51,84.62,50.08,211.77,190.67,208.16
接地故障,间歇性接地,157.63,174.99,194.98,116.39,118.38,144.29,298.83,243.57,361.94
接地故障,中性点接地故障,7.40,33.96,6.60,16.63,49.54,31.07,0.00,0.00,0.00
失压故障,全电压失压,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
失压故障,部分失压,21.78,37.63,29.22,42.53,83.19,78.51,159.08,180.06,262.81
失压故障,电压骤降,53.61,91.22,55.59,239.49,186.68,236.15,517.95,447.26,560.36
失压故障,欠电压,123.50,162.65,161.26,80.73,132.77,122.17,386.56,373.87,322.66
失压故障,电压不平衡,192.53,166.93,192.45,108.05,99.75,92.09,495.57,537.28,423.33
过载故障,持续过载,233.48,230.72,229.03,303.05,262.74,349.27,1320.28,1205.75,1243.48
过载故障,短时过载,226.39,232.93,238.98,345.35,367.29,215.36,1353.64,1302.16,1487.31
过载故障,变压器过载,229.21,221.01,202.78,435.16,449.94,359.41,1283.49,1895.34,1223.00
过载故障,线路过载,208.76,219.43,210.24,444.03,352.81,237.76,1226.76,1142.24,1122.67
过载故障,开关设备过载,225.75,220.39,205.25,275.90,278.72,99.29,303.31,472.24,864.77
断路故障,导线断线,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
断路故障,接头断开,46.53,21.65,35.72,32.63,16.26,22.23,80.68,201.12,181.92
断路故障,开关拒动,46.60,20.79,45.36,68.99,36.05,24.01,191.83,117.09,164.05
短路故障,三相短路,44.53,14.07,41.02,581.45,516.64,390.72,847.41,591.22,710.74
短路故障,两相短路,36.58,89.24,72.85,489.42,386.84,378.42,567.49,685.06,670.17
短路故障,相间短路,53.11,75.56,86.82,494.43,409.65,465.59,847.93,845.22,468.42
短路故障,金属性短路,6.64,2.58,4.24,733.30,826.09,984.27,1001.55,1170.83,1002.30
短路故障,电弧性短路,44.24,38.22,24.35,555.50,584.06,473.87,955.46,974.31,645.53
接地故障,单相接地,165.96,195.74,195.99,35.38,94.06,94.06,334.19,449.24,476.99
接地故障,低阻接地,9.62,14.61,14.53,483.83,433.95,434.42,625.92,700.58,691.63
接地故障,高阻接地,220.82,210.16,225.65,98.86,65.76,87.53,253.24,138.37,194.74
"""
# 使用 StringIO 将字符串模拟成文件供 pandas 读取
data_io = io.StringIO(csv_data)
df = pd.read_csv(data_io)
# 定义特征列和目标列
feature_cols = ['计量UAB', '计量UBC', '计量UCA', '计量IA', '计量IB', '计量IC', '计量有功功率', '计量无功功率', '计量视在功率']
target_col = '二级分类' # 我们预测更具体的二级分类
# 分离特征 (X) 和目标 (y)
X = df[feature_cols]
y = df[target_col]
# 对目标变量进行编码 (将文本标签转换为数字)
# 1. Label Encoding (文本 -> 整数)
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)
num_classes = len(label_encoder.classes_) # 获取类别总数
print(f"故障类别数量: {num_classes}")
print(f"类别标签映射: {dict(zip(label_encoder.classes_, range(num_classes)))}")
# 2. One-Hot Encoding (整数 -> 独热向量) - Keras Dense 输出层需要
onehot_encoder = OneHotEncoder(sparse_output=False) # sparse=False for dense array
y_onehot = onehot_encoder.fit_transform(y_encoded.reshape(-1, 1))
# 划分训练集和测试集 (例如 80% 训练, 20% 测试)
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42, stratify=y_onehot) # stratify 保证类别比例
# 特征缩放 (非常重要,特别是对于神经网络)
# 使用 StandardScaler 将数据标准化 (均值为0,方差为1)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train) # 仅在训练集上 fit
X_test_scaled = scaler.transform(X_test) # 在测试集上 transform
# --- 2. 构建 DNN 模型 ---
input_dim = X_train_scaled.shape[1] # 特征数量
model = keras.Sequential(
[
keras.layers.InputLayer(input_shape=(input_dim,)), # 输入层,明确指定输入维度
keras.layers.Dense(128, activation="relu", name="hidden_layer_1"), # 第一个隐藏层,128个神经元,ReLU激活函数
keras.layers.Dropout(0.3), # Dropout 层防止过拟合
keras.layers.Dense(64, activation="relu", name="hidden_layer_2"), # 第二个隐藏层,64个神经元
keras.layers.Dropout(0.3), # Dropout 层
keras.layers.Dense(32, activation="relu", name="hidden_layer_3"), # 第三个隐藏层,32个神经元
keras.layers.Dense(num_classes, activation="softmax", name="output_layer") # 输出层,神经元数量等于类别数,Softmax激活函数用于多分类
]
)
# 打印模型结构
model.summary()
# --- 3. 编译模型 ---
# 配置模型的学习过程
model.compile(
loss="categorical_crossentropy", # 多分类交叉熵损失函数 (因为 y 是 one-hot 编码)
optimizer="adam", # Adam 优化器,常用的梯度下降优化算法
metrics=["accuracy"] # 评估指标:准确率
)
# --- 4. 训练模型 ---
print("\n开始训练模型...")
epochs = 150 # 训练轮数 (可以根据需要调整)
batch_size = 16 # 每批次处理的样本数 (可以根据需要调整)
# 添加 EarlyStopping 回调函数,防止过拟合
# 如果验证集上的损失在 patience 轮内没有改善,则停止训练
early_stopping = keras.callbacks.EarlyStopping(
monitor='val_loss', # 监控验证集损失
patience=15, # 容忍多少轮没有改善
restore_best_weights=True # 恢复到最佳权重
)
history = model.fit(
X_train_scaled,
y_train,
epochs=epochs,
batch_size=batch_size,
validation_split=0.15, # 从训练集中分出一部分作为验证集,监控训练过程
callbacks=[early_stopping], # 使用回调函数
verbose=1 # 显示训练过程 (0:不显示, 1:进度条, 2:每轮一行)
)
print("模型训练完成.")
# --- 5. 评估模型 ---
print("\n在测试集上评估模型:")
loss, accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"测试集损失 (Loss): {loss:.4f}")
print(f"测试集准确率 (Accuracy): {accuracy:.4f}")
# --- 6. 使用模型进行预测 (示例) ---
print("\n使用模型进行预测示例:")
# 假设有一个新的故障数据点 (需要与原始特征顺序一致)
# 注意:这里的数值是随意编造的,仅作演示
new_fault_data = pd.DataFrame([[
50.0, 55.0, 5.0, # UAB, UBC, UCA
400.0, 350.0, 300.0, # IA, IB, IC
600.0, 500.0, 700.0 # P, Q, S
]], columns=feature_cols)
# 同样需要对新数据进行缩放 (使用之前 fit 好的 scaler)
new_fault_data_scaled = scaler.transform(new_fault_data)
# 进行预测
predictions_proba = model.predict(new_fault_data_scaled)
predicted_class_index = np.argmax(predictions_proba, axis=1)
# 将预测的索引转换回原始的故障类别标签
predicted_class_label = label_encoder.inverse_transform(predicted_class_index)
print(f"新故障数据的预测概率分布:\n{predictions_proba}")
print(f"预测的故障类别索引: {predicted_class_index[0]}")
print(f"预测的故障类别标签: {predicted_class_label[0]}")
# --- 可选:绘制训练过程中的损失和准确率 ---
import matplotlib.pyplot as plt
def plot_history(history):
# 绘制训练 & 验证的准确率值
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
# 绘制训练 & 验证的损失值
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.tight_layout()
plt.show()
# 解决 Matplotlib 中文显示问题 (如果需要)
# plt.rcParams['font.sans-serif'] = ['SimHei'] # 或者其他支持中文的字体
# plt.rcParams['axes.unicode_minus'] = False # 解决负号显示问题
plot_history(history)
代码解释:
数据加载与准备:
feature_cols
) 和目标列 (target_col
,这里选择了更详细的 二级分类
)。LabelEncoder
将文本形式的故障类别转换为从 0 开始的整数。OneHotEncoder
将整数类别转换为独热编码向量(例如,如果有 5 个类别,类别 ‘2’ 会变成 [0, 0, 1, 0, 0]
)。这是 Keras 使用 categorical_crossentropy
损失函数时需要的格式。train_test_split
将数据划分为训练集和测试集。stratify=y_onehot
参数确保训练集和测试集中各类别的比例与原始数据大致相同。StandardScaler
对特征数据进行标准化。这是非常关键的一步,可以使神经网络训练更稳定、更快速。注意 fit_transform
只在训练集上调用,测试集和新数据只使用 transform
。构建 DNN 模型:
keras.Sequential
创建一个顺序模型。InputLayer
: 定义模型的输入形状,即特征的数量。Dense
层: 全连接层。
activation='relu'
:使用 ReLU 激活函数,这是隐藏层常用的选择。name
: 可以给层命名,方便调试和理解。Dropout
层: 以一定比例(这里是 30%)随机将神经元的输出设置为 0,有助于防止模型在训练数据上过拟合。num_classes
必须等于你的故障类别总数。activation='softmax'
:Softmax 激活函数用于多分类问题,它会将输出转换为概率分布,所有类别的概率之和为 1。编译模型:
loss='categorical_crossentropy'
: 指定损失函数。因为我们的目标是 one-hot 编码的,所以使用分类交叉熵。optimizer='adam'
: 指定优化器。Adam 是一个高效且常用的优化算法。metrics=['accuracy']
: 指定在训练和评估过程中需要监控的指标,这里是准确率。训练模型:
model.fit()
: 开始训练过程。X_train_scaled
, y_train
: 训练数据和对应的标签。epochs
: 训练的总轮数。batch_size
: 每次迭代(更新模型权重)使用的样本数量。validation_split
: 从训练数据中划分出一部分作为验证集,用于在每个 epoch 结束后评估模型性能,帮助判断是否过拟合。callbacks=[early_stopping]
: 使用回调函数。EarlyStopping
可以在验证集性能不再提升时提前停止训练,防止过拟合,并可以选择性地恢复到性能最佳时的模型权重。verbose=1
: 显示训练进度条。评估模型:
model.evaluate()
: 使用测试集(模型从未见过的数据)来评估最终训练好的模型的性能。返回损失值和指定的指标(准确率)。预测:
scaler
对新数据进行 transform
。model.predict()
: 对处理后的新数据进行预测。输出的是每个类别对应的概率。np.argmax()
: 找到概率最高的那个类别的索引。label_encoder.inverse_transform()
: 将预测出的索引转换回原始的文本标签。绘图 (可选):
运行和调整:
tensorflow
, pandas
, scikit-learn
, 和 matplotlib
。 (pip install tensorflow pandas scikit-learn matplotlib
)csv_data
替换为您实际的数据加载方式(例如 pd.read_csv('your_fault_data.csv')
)。epochs
, batch_size
, Dropout
的比例,或者尝试不同的优化器(如 RMSprop
, SGD
)和学习率。这个模型提供了一个起点,可以根据实际效果和需求进行进一步的优化和调整。