【免费】论文研究-使用机器学习算法预测信用卡交易欺诈_神经网络在信用卡欺诈预测中的应用

行业研究

需积分: 0 155 浏览量更新于2023-05-15 评论 4 收藏 956KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Journal of Intelligent Learning Systems and Applications, 2019, 11, 33-63

http://www.scirp.org/journal/jilsa

ISSN Online: 2150-8410

ISSN Print: 2150-8402

DOI:

10.4236/jilsa.2019.113003 Aug. 14, 2019 33 Journal of Intelligent Learning Systems and Applications

Predicting Credit Card Transaction Fraud Using

Machine Learning Algorithms

Jiaxin Gao

, Zirui Zhou

, Jiangshan Ai

, Bingxin Xia

, Stephen Coggeshall

Hebei University of Economics and Business, Shijiazhuang, China

China University of Political Science and Law, Beijing, China

Wuhan Maple Leaf International School (High School), Wuhan, China

Wuhan Jinde Education Consulting, Co., Ltd., Wuhan, China

University of Southern California, Los Angeles, USA

Abstract

Credit card fraud is a wide-

ranging issue for financial institutions, involving

theft and fraud committed using a payment card. In this paper,

we explore

the application of linear and nonlinear statistical modeling and machine

learning models on real credit card transaction da

ta. The models built are

supervised fraud models that attempt to identify which transactions are

most likely fraudulent. We discuss the processes of data exploration,

data

cleaning, variable creation, feature selection, model algorithms, and results.

Five different supervised models are explored and compared including lo-

gistic regression, neural networks, random forest,

boosted tree and support

vector machines. The boosted tree model shows the best fraud detection re-

sult (FDR = 49.83%) for this particular data set. The

resulting model can be

utilized in a credit card fraud detection system. A similar

model development

process can be performed in related business domains such as insurance and

telecommunications, to avoid or detect fraudulent activity.

Keywords

Credit Card Fraud, Machine Learning Algorithms, Logistic Regression,

Neural Networks, Random Forest, Boosted Tree, Support Vector Machines

1. Introduction

Credit card fraud remains an important issue for theft and fraud committed us-

ing a payment card, such as a credit card or debit card. To combat this many

fraud detection algorithms are widely used in industry [1] [2] [3] [4]. Card fraud

can happen with the theft of the physical card as well as with the compromise of

How to cite this paper:

Gao, J.X.,

Zhou,

.R., Ai, J.S., Xia, B.X. and Coggeshall, S.

(201

Predicting Credit Card Transaction

Fraud Using Machine Learning Algo

rithms

Journal of Intelligent Learning Systems and

Applications

, 33-63.

https://doi.org/10.4236/jilsa.2019.113003

Received:

April 6, 2019

Accepted:

August 11, 2019

Published:

August 14, 2019

9 by author(s) and

Scientific

Research Publishing Inc.

This work is licensed under the Creative

Commons Attribution

-NonCommercial

International License (

CC BY-NC 4.0).

http://creativecommons.org/licenses/by

-nc/4.0/

Open Access

J. X. Gao et al.

DOI:

10.4236/jilsa.2019.113003 34 Journal of Intelligent Learning Systems and Applications

the card, including skimming, breach, account takeover, that would otherwise

look like a legitimate transaction. According to the Global Payments Report

2015 [5], the credit card is the highest-used payment method globally in 2014

compared to other methods such as an e-wallet and Bank Transfer. Along with

the rise of credit card usage, the number of fraud cases has also been steadily in-

creasing [6]. The rise in credit card fraud has a large impact on the financial in-

dustry. The global credit card fraud in 2015 reached a staggering USD 21.84 bil-

lion [7].

Financial institutions today typically develop custom fraud detection systems

targeted to their own portfolios [8]. The data mining and machine learning

techniques are vastly embraced to analyze patterns of normal and unusual beha-

vior as well as individual transactions in order to flag likely fraud. Given the re-

ality, the best cost-effective option is to tease out possible evidence of fraud from

the available data using statistical algorithms [9]. Supervised models trained on

labeled data examine all previous labeled transactions to mathematically deter-

mine how a typical fraudulent transaction looks and assigns a fraud probability

score to each transaction [10]. Among the supervised algorithms typically used,

the neural network is popular, and support vector machines (SVMs) have been

applied, as well as decision trees and other models [3] [9] [11]-[21]. However,

little attention has been devoted in the literature to some comparison of all the

common algorithms, particularly using real data sets.

In this paper, we explore the application of various linear and nonlinear statis-

tical modeling and machine learning models on credit card transaction data. The

models built are supervised fraud models that attempt to identify which transac-

tions are most likely fraudulent.

2. Description of Data

The data available for this research project are a collection of credit card transac-

tions from a government agency located in Tennessee, U.S.A. The particular

agency is not known.

The data consist of 96,753 credit card transactions during the year 2010, with

1059 labeled as fraud. The file contains the fields:

• Record: A unique identifier for each data record. This field also represents

the time order;

• Cardnum: The account number for the transactions (we note that they are

Mastercard transaction since the account numbers begin with the digits 54);

• Date: The date of the transaction. Month, day and year only (no time of day);

• Merchnum: A typically 12-digit merchant identification number;

• Merch Description: A brief text description field of the merchant, typically

around 20 characters;

• Merch State: The state of the address for the merchant;

• Merch Zip: The zip code of the merchant;

• Transtype: A code denoting the type of transaction;

J. X. Gao et al.

DOI:

10.4236/jilsa.2019.113003 35 Journal of Intelligent Learning Systems and Applications

• Amount: The dollar amount of the transaction;

• Fraud: A label for the transaction to indicate whether or not it was a fraudu-

lent transaction.

Table 1 shows summary information about all the fields. Only the Amount

field is a numeric type field; the other fields are all categorical or text. The statis-

tical magnitudes in the table were calculated with the outliers eliminated. Three

fields have some missing values: Merchnum, Merch state, and Merch zip. It was

noticed that the number of unique values of the Merch state field is 227, which is

unexpected because the U.S. has only 50 states. Some of the values in this field

might be from other countries, such as Canada and/or Mexico.

Below we show some further information about the data.

Figure 1 shows the number of transactions each month. We noticed the gen-

eral upward trend through September, followed by a sharp drop in October. The

monthly transactions are fewer in the last quarter of the year compared with

other quarters. This is due to the government fiscal year which starts on October

1, and people tend to be more cautious with their money in the first few months

of the new fiscal year.

Figure 2 shows the top 10 of the most frequently traded merchant descrip-

tions. The total transaction frequency of the top 15 categories is 13,256, which is

about 13.7% of the records. The top 200 categories account for 41% of the total

records. In

Table 1, we see that there are 13,126 kinds of merchants by this

Merch description field, and 48.6% of the merchant descriptions only occurred

once.

Figure 3 depicts the top 10 of the most frequently observed merchant states.

Table 1. Summary description of the data set.

Fields name Type

Records that

have a value

Percent

populated

Mode # Unique values

Record Index 96,753 100% 96,753

Cardnum Categorical 96,753 100% 5142148452 1645

Date Time 96,753 100% 2/28/10 365

Merchnum

Categorical

93,378 96.5% 930090121224 13,091

Merch description 96,753 100% GSA-FSS-ADV 13,126

Merch state 95,558 98.8% TN 227

Merch zip 92,097 95.2% 38118 4567

Transtype 96,753 100% P 4

Amount Numeric 96,753 100% 3.62

Mean 395.33*

34,909

Max 30372.46*

Min 0.01

Std 814.74*

Fraud Categorical 96,753 100% 0 2

*Statistical magnitude without outliers.

剩余30页未读，继续阅读

weixin_38641339

粉丝: 12
资源: 927

会员权益专享

论文研究 - 使用机器学习算法预测信用卡交易欺诈

评论0

会员权益专享

最新资源

论文研究 - 使用机器学习算法预测信用卡交易欺诈

评论0

案例实战信用卡欺诈检测数据集

Credit-Card-Fraud-detection:这是一种具有Logistic回归的机器学习模型，用于确定信用卡交易中欺诈案件的数量

使用机器学习技术的信用卡交易欺诈检测：欺诈性信用卡交易的分类

sciket-learn机器学习算法 思维导图

WOA-SVM机器学习算法代码

利用机器学习算法预测

机器学习算法预测成绩

你能帮我写一份基于机器学习的信用卡欺诈技术的开题报告吗

用机器学习算法预测债券评级有什么意义

基于机器学习算法的分类写个论文大纲

目前最热门的机器学习算法

机器学习预测算法优劣

机器学习算法预测ip地址经纬度

机器学习算法预测船只航行路线

具体说说如何使用机器学习算法进行预测和建模

机器学习算法预测股票价格

使用机器学习算法对水质进行预测

机器学习项目实战----信用卡欺诈检测(一)

机器学习算法包括哪些算法

钢水温度怎么使用机器学习算法预测

会员权益专享

最新资源

sciket-learn机器学习算法思维导图