对于data = np.array(dt1),np.array形式的csv文件dt1,倒数第1列数据是不是data[:,-1]
时间: 2023-06-27 08:06:25 浏览: 113
是的,对于一个二维的 NumPy 数组,可以使用 `array_name[:, -1]` 的方式来获取最后一列的数据。在你的例子中,`data = np.array(dt1)` 生成了一个二维的 NumPy 数组,因此 `data[:, -1]` 就是最后一列的数据。注意,这里的 `-1` 表示倒数第一列,如果要获取倒数第二列,可以使用 `-2`。
相关问题
解决import pandas as pd import numpy as np # 读取数据 train = pd.read_csv("D:\数据挖掘\实验\实验六\clean_train.csv") test = pd.read_csv("D:\数据挖掘\实验\实验六\clean_test.csv") # 抽取正负样本 train = train[train['label'] == 1].sample(sum(train['label'] == 1)).append(train[train['label'] == 0].sample(sum(train['label'] == 0))) test = test[test['label'] == 1].sample(sum(test['label'] == 1)).append(test[test['label'] == 0].sample(sum(test['label'] == 0))) # 删除列 x_train = train.drop(['user_id', 'merchant_id', 'coupon_id','date_received', 'date'], axis=1) x_test = test.drop(['user_id', 'merchant_id', 'coupon_id','date_received', 'date', 'label'], axis=1) # 处理无穷数据(无穷数据大或者无穷数据小) x_train[np.isinf(x_train)] = 0 x_test[np.isinf(x_test)] = 0from sklearn.tree import DecisionTreeClassifier model_dt1 = DecisionTreeClassifier(max_leaf_nodes=16, random_state=123).fit(x_train.drop(['label'], axis=1), x_train['label'])报错ValueError: Input contains NaN, infinity or a value too large for dtype('float32').问题
你在处理无穷数据时使用了 np.isinf() 函数,但是你并没有处理 NaN 值。你可以使用 Pandas 库中的 fillna() 函数来处理 NaN 值。下面是一个处理 NaN 和无穷数据的代码示例:
```python
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
# 读取数据
train = pd.read_csv("D:\数据挖掘\实验\实验六\clean_train.csv")
test = pd.read_csv("D:\数据挖掘\实验\实验六\clean_test.csv")
# 抽取正负样本
train = train[train['label'] == 1].sample(sum(train['label'] == 1)).append(train[train['label'] == 0].sample(sum(train['label'] == 0)))
test = test[test['label'] == 1].sample(sum(test['label'] == 1)).append(test[test['label'] == 0].sample(sum(test['label'] == 0)))
# 删除列
x_train = train.drop(['user_id', 'merchant_id', 'coupon_id','date_received', 'date'], axis=1)
x_test = test.drop(['user_id', 'merchant_id', 'coupon_id','date_received', 'date', 'label'], axis=1)
# 处理无穷数据和 NaN
x_train = x_train.replace([np.inf, -np.inf], np.nan)
x_train.fillna(0, inplace=True)
x_test = x_test.replace([np.inf, -np.inf], np.nan)
x_test.fillna(0, inplace=True)
# 训练决策树模型
model_dt1 = DecisionTreeClassifier(max_leaf_nodes=16, random_state=123).fit(x_train.drop(['label'], axis=1), x_train['label'])
```
这样处理之后,你的数据集中就不应该包含 NaN 和无穷值了。
SELECT t1.id, t1.[kh] AS '客户', t1.[lrsj] AS '录入时间', t1.[htbh] AS '合同编号', t1.[rq] AS '签订日期', t1.[htqsrq] AS '起始日期', t1.[htjzrq] AS '截止日期', t1.[zjey] AS '总金额', t1.[hjzld] AS '总吨数', t1.[lrsj] AS '创建时间', t1.[chpt] AS '撮合平台', CAST ( t1.[htwj] AS VARCHAR ( MAX ) ) AS '合同文件', STUFF( ( SELECT ',' + CONVERT ( VARCHAR, CASE WHEN ISNUMERIC( REPLACE( t2.ppcs, ',', '' ) ) = 1 THEN CAST ( REPLACE( t2.ppcs, ',', '' ) AS FLOAT ) END ) FROM uf_xsht_dt1 t2 WHERE t1.id = t2.mainid FOR XML PATH ( '' ) ), 1, 1, '' ) AS '品牌', STUFF( ( SELECT ' ' + CONVERT ( VARCHAR, CASE WHEN ISNUMERIC( REPLACE( t2.hsdjyd, ' ', '' ) ) = 1 THEN CAST ( REPLACE( t2.hsdjyd, ' ', '' ) AS FLOAT ) END ) FROM uf_xsht_dt1 t2 WHERE t1.id = t2.mainid FOR XML PATH ( '' ) ), 1, 1, '' ) AS '销售单价', (SELECT SUM(sjfk) FROM uf_xshksj2 WHERE cght = t1.id) AS '实际已收', (SELECT SUM(hkje) FROM uf_xshksj2 WHERE cght = t1.id) AS '已收货款', (SELECT SUM(sjfk) FROM uf_xshksj2 WHERE kxlx = 0 AND cght = t1.id) AS '订金', CASE WHEN ((SELECT SUM(ISNULL(hkje, 0)) FROM uf_xshksj2 WHERE cght = t1.id) + (SELECT SUM(ISNULL(sjfk, 0)) FROM uf_xshksj2 WHERE kxlx = 0 AND cght = t1.id)) < t1.[zjey] THEN (SELECT SUM(ISNULL(hkje, 0)) FROM uf_xshksj2 WHERE cght = t1.id) / t1.[zjey] ELSE ((SELECT SUM(ISNULL(hkje, 0)) FROM uf_xshksj2 WHERE cght = t1.id) + (SELECT SUM(ISNULL(sjfk, 0)) FROM uf_xshksj2 WHERE kxlx = 0 AND cght = t1.id)) / t1.[zjey] END AS '已收比例' FROM uf_xsht t1 INNER JOIN uf_xsht_dt1 t2 ON t1.id = t2.mainid LEFT JOIN uf_xshksj2 t3 ON t1.id = t3.cght GROUP BY t1.id, t1.[kh], t1.[lrsj], t1.[htbh], t1.[rq], t1.[htqsrq], t1.[htjzrq], t1.[zjey], t1.[hjzld], t1.[chpt], CAST ( t1.[htwj] AS VARCHAR ( MAX ) );为什么只有第一个有数据
根据你提供的查询语句,我注意到在计算已收比例的部分使用了一个 CASE 表达式。然而,在这个查询中,你使用了两个子查询来计算已收货款和订金,并在 CASE 表达式中引用了这两个子查询。这可能导致只有第一个子查询的结果被使用,而第二个子查询的结果被忽略。
为了解决这个问题,你可以通过将已收货款和订金的子查询合并成一个子查询,并在 CASE 表达式中引用该合并后的子查询来计算已收比例。以下是修改后的查询语句:
```sql
SELECT
t1.id,
t1.[kh] AS '客户',
t1.[lrsj] AS '录入时间',
t1.[htbh] AS '合同编号',
t1.[rq] AS '签订日期',
t1.[htqsrq] AS '起始日期',
t1.[htjzrq] AS '截止日期',
t1.[zjey] AS '总金额',
t1.[hjzld] AS '总吨数',
t1.[lrsj] AS '创建时间',
t1.[chpt] AS '撮合平台',
CAST ( t1.[htwj] AS VARCHAR ( MAX ) ) AS '合同文件',
STUFF(
(
SELECT
',' + CONVERT (
VARCHAR,
CASE
WHEN ISNUMERIC( REPLACE( t2.ppcs, ',', '' ) ) = 1 THEN
CAST ( REPLACE( t2.ppcs, ',', '' ) AS FLOAT )
END
)
FROM uf_xsht_dt1 t2
WHERE t1.id = t2.mainid FOR XML PATH ( '' )
),
1,
1,
''
) AS '品牌',
STUFF(
(
SELECT
' ' + CONVERT (
VARCHAR,
CASE
WHEN ISNUMERIC( REPLACE( t2.hsdjyd, ' ', '' ) ) = 1 THEN
CAST ( REPLACE( t2.hsdjyd, ' ', '' ) AS FLOAT )
END
)
FROM uf_xsht_dt1 t2
WHERE t1.id = t2.mainid FOR XML PATH ( '' )
),
1,
1,
''
) AS '销售单价',
t3.已收货款 AS '实际已收',
t3.订金 AS '已收货款',
(t3.已收货款 + t3.订金) / t1.[zjey] AS '已收比例'
FROM
uf_xsht t1
INNER JOIN uf_xsht_dt1 t2 ON t1.id = t2.mainid
LEFT JOIN (
SELECT
cght,
SUM(ISNULL(hkje, 0)) AS 已收货款,
SUM(CASE WHEN kxlx = 0 THEN ISNULL(sjfk, 0) ELSE 0 END) AS 订金
FROM
uf_xshksj2
GROUP BY
cght
) t3 ON t1.id = t3.cght
GROUP BY
t1.id,
t1.[kh],
t1.[lrsj],
t1.[htbh],
t1.[rq],
t1.[htqsrq],
t1.[htjzrq],
t1.[zjey],
t1.[hjzld],
t1.[chpt],
CAST ( t1.[htwj] AS VARCHAR ( MAX ) );
```
在上述查询中,我将已收货款和订金的子查询合并为一个子查询,并将其命名为 t3。然后,在 SELECT 子句中引用了 t3 中的已收货款和订金字段,并使用 `(t3.已收货款 + t3.订金) / t1.[zjey]` 计算已收比例。
请根据你的实际需求和数据结构进行适当的调整。希望这能解决你的问题!
阅读全文