根据数据文件titannic.zip,使用HIVE完成以下操作 统计获救与死亡情况 统计舱位分布情况 统计港口登船人数分布情况 统计性别与生存率的关系 统计客舱等级与生存率的关系 统计登船港口与生存率的关系
时间: 2024-05-13 11:19:42 浏览: 4
首先需要在Hive中创建表,然后导入数据文件,才能进行统计分析。
1. 创建表
在Hive中使用以下命令创建名为“titannic”的表:
```
CREATE TABLE titannic (
PassengerId INT,
Survived INT,
Pclass INT,
Name STRING,
Sex STRING,
Age DOUBLE,
SibSp INT,
Parch INT,
Ticket STRING,
Fare DOUBLE,
Cabin STRING,
Embarked STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
```
2. 导入数据
使用LOAD DATA INPATH命令将titannic.zip文件中的数据导入到titannic表中:
```
LOAD DATA INPATH '/path/to/titannic.zip' OVERWRITE INTO TABLE titannic;
```
3. 统计获救与死亡情况
```
SELECT
CASE Survived
WHEN 0 THEN '死亡'
WHEN 1 THEN '获救'
ELSE '未知'
END AS status,
COUNT(*) AS count
FROM titannic
GROUP BY Survived;
```
4. 统计舱位分布情况
```
SELECT
CASE Pclass
WHEN 1 THEN '头等舱'
WHEN 2 THEN '二等舱'
WHEN 3 THEN '三等舱'
ELSE '未知'
END AS cabin_class,
COUNT(*) AS count
FROM titannic
GROUP BY Pclass;
```
5. 统计港口登船人数分布情况
```
SELECT
CASE Embarked
WHEN 'C' THEN 'Cherbourg'
WHEN 'Q' THEN 'Queenstown'
WHEN 'S' THEN 'Southampton'
ELSE '未知'
END AS port,
COUNT(*) AS count
FROM titannic
GROUP BY Embarked;
```
6. 统计性别与生存率的关系
```
SELECT
Sex,
SUM(Survived) AS survived_count,
COUNT(*) AS total_count,
1.0 * SUM(Survived) / COUNT(*) AS survival_rate
FROM titannic
GROUP BY Sex;
```
7. 统计客舱等级与生存率的关系
```
SELECT
CASE Pclass
WHEN 1 THEN '头等舱'
WHEN 2 THEN '二等舱'
WHEN 3 THEN '三等舱'
ELSE '未知'
END AS cabin_class,
SUM(Survived) AS survived_count,
COUNT(*) AS total_count,
1.0 * SUM(Survived) / COUNT(*) AS survival_rate
FROM titannic
GROUP BY Pclass;
```
8. 统计登船港口与生存率的关系
```
SELECT
CASE Embarked
WHEN 'C' THEN 'Cherbourg'
WHEN 'Q' THEN 'Queenstown'
WHEN 'S' THEN 'Southampton'
ELSE '未知'
END AS port,
SUM(Survived) AS survived_count,
COUNT(*) AS total_count,
1.0 * SUM(Survived) / COUNT(*) AS survival_rate
FROM titannic
GROUP BY Embarked;
```