【免费】论文研究-使用增强的回归树和遥感数据来推动决策_增强回归树

需积分: 0 5 浏览量更新于2023-05-15 评论 3 收藏 716KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Open Journal of Statistics, 2017, 7, 859-875

http://www.scirp.org/journal/ojs

ISSN Online: 2161-7198

ISSN Print: 2161-718X

DOI:

10.4236/ojs.2017.75061 Oct. 31, 2017 859 Open Journal of Statistics

Using Boosted Regression Trees and Remotely

Sensed Data to Drive Decision-Making

Brigitte Colin, Samuel Clifford, Paul Wu, Samuel Rathmanner, Kerrie Mengersen

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia

Abstract

Challenges in Big Data analysis arise due to the way the data are recorded,

maintained, processed and stored. We demonstrate that a hierarchical, mult

variate, statistical machine learning algorithm, namely Boosted Regression

Tree (BRT) can

address Big Data challenges to drive decision making. The

challenge of this study is lack of interoperability since the data, a collection of

GIS shapefiles, remotely sensed imagery, and aggregated and interpolated

spatio-temporal information, are stored i

n monolithic hardware components.

For the modelling process, it was necessary to create one common input file.

By merging the data sources together, a structured but noisy input file, sho

ing inconsistencies and redundancies, was created. Here, it is shown

that BRT

can process different data granularities, heterogeneous data and mis

singness.

In particular, BRT has

the advantage of dealing with missing data by default

by allowing a split on whether or not a value is missing as well as what the

value is. Most importantly, the BRT offers a wide range of possibilities r

garding the interpretation of results and variable selection is automatically

performed by considering how frequently a variable is used to define a split in

the tree. A comparison with two simil

ar regression models (Random Forests

and Least Absolute Shrinkage and Selection Operator, LASSO) shows

that

BRT outperforms these in this instance. BRT can also be a starting point for

sophisticated hierarchical modelling in real world scenarios. For examp

le, a

single or ensemble approach of BRT could be tested with existing models in

order to improve results for a wide range of data-driven decisions and appl

cations.

Keywords

Boosted Regression Trees, Remotely Sensed Data, Big Data Modelling Approach

Missing Data

How to cite this paper:

Colin, B.,

Clifford,

., Wu, P., Rathmanner, S. and Mengersen,

. (2017)

Using Boosted Regression Trees

and Remotely Sensed Data to

Drive Deci-

sion

-Making.

Open Journal of Statistics

859

-875.

https://doi.org/10.4236/ojs.2017.75061

Received:

September 27, 2017

Accepted:

October 28, 2017

Published:

October 31, 2017

2017 by authors and

Scientific

Research Publishing Inc.

This work is licensed under the Creative

Commons Attribution International

License (CC BY

4.0).

http://creativecommons.org/licenses/by/4.0/

Open Access

B. Colin et al.

DOI:

10.4236/ojs.2017.75061 860 Open Journal of Statistics

1. Background

Data are typically stored in various ways and various formats, mostly in mono-

lithic software architectures which do not allow for interoperability. Analysis of

data across multiple data sources is thus difficult, since the functionality of the

single data sources with respect to input and output, maintenance, data

processing, error handling and user interface is all interwoven and acts as archi-

tecturally separate components. In order to create a basis for analysing the data

considered here, it was required to extract the datasets from their original data-

bases and combine them to form a common input file for the modelling process.

It was therefore inevitable that this resulted in a data file structure which showed

missing data, inconsistencies, duplicates and redundancies.

A case study is presented here to examine land use data sourced from a GIS,

direct observations from an agricultural company, and remotely sensed data.

The data were extracted from a relational database, Excel spreadsheets, remotely

sensed imagery stored as raster data, and vector data from a Geographic Infor-

mation System (GIS), directly observed and measured data in real-time and in-

terpolated data. By combining these data sources to form one common basis for

our analysis, issues of data volume, variety and veracity were encountered. Big

Data research clearly deals with issues beyond volume and belongs not only to

the ongoing digital revolution, but to the scientific revolution as well. The ques-

tion posed of Big Data and illustrated in the case study presented here, is wheth-

er new knowledge can be extracted from various data sources that haven’t been

analysed in combination before, and can thus assist in a better and more confi-

dent decision making.

2. Introduction

There is an exponential increase in interest in the use of digital data to improve

decision making in a range of areas such as human systems, urban environ-

ments, agriculture and national security. For example, decisions in the agricul-

tural domain may require information based on vegetation or land use change,

estimation of crops or biomass, distribution of native or exotic species, livestock

or weed assessment and so on. One source of digital data that has generated in-

tense interest over the past decades is remotely sensed imagery. These data are

available from a wide range of sources, ranging from satellites to drones, and

have been used for a very wide range of environmental applications [1]-[8].

The availability and resolution of these data, combined with improved com-

puter storage and data management facilities, have greatly increased the oppor-

tunity for mathematicians and statisticians to utilise this information in their

models and analyses. The challenge in linking remotely sensed data to decision-

making is that there are multiple steps in the process. Here, we focus on an ex-

emplar real-world problem in the livestock industry: deciding on the allocation

of animals to different paddocks and potentially different grazing properties

based on the predicted availability of grass over the year. This problem arose in

B. Colin et al.

DOI:

10.4236/ojs.2017.75061 861 Open Journal of Statistics

the context of collaboration between statisticians at the Queensland University

of Technology and a large livestock organisation in Australia. The specific aim of

the project was to develop an ensemble of models to predict the carrying capaci-

ty, that is, the number of animals that can be sustained on a paddock. In order to

achieve this goal we utilised remote sensing data and supporting information

about climate and paddock characteristics. Further, it was important to present

the results in a form that is useful for the agricultural decision makers.

Difficult or challenging decisions demand a thorough consideration and even

then they imply uncertainty, complexity and different levels of risk. Making the

right decisions at the right time can lead to success, increase of profit or mini-

misation of risk. It is thus important that thoughtful considerations are put into

each decision.

Figure 1 demonstrates the workflow following a Big Data ap-

proach for our case study. Here, we use structured but heterogeneous data

sources that showed characteristics like missing data, noise and redundancies.

All the data sources were used to create a BRT model via an ensemble approach.

The resulting model and its output serves as a foundation for a better decision

making. The steps involved in the process are depicted in

Figure 1. Due to

commercial confidentiality concerns, the final results of the modelling workflow

are not presented here.

In this article we focus on one component of the ensemble modelling ap-

proach employed in the project, namely the use of BRT to estimate so-called

animal equivalents per paddock. Since calves, cows and bulls of different ages

consume different amounts of grass, these animals are standardised to a refer-

ence animal which can then be used as a common response variable in the anal-

ysis. An interesting conundrum is that one of the major inputs into such a model

is the amount of grass, or more generally the biomass, in a paddock. This can

potentially be estimated directly from remote sensing, but is confounded by the

fact that animals are on the paddock eating the very thing that is being measured

by the sensor. Moreover, the decision maker may be interested in the biomass

estimates themselves, either directly via the remotely sensed measurements or

indirectly via the animal equivalents based on animal weight and metabolic

formula.

A BRT is a popular statistical and machine learning approach that has not yet

seen much application in the analysis of remotely sensed data. Indeed, although

they were first defined two decades ago, BRT has only recently been extended to

deal with the types of features that are characteristic of remotely sensed data, in

particular its spatial and temporal dynamics. Most of the activity around the use

of BRT for agricultural and environmental applications does not appear in the

mainstream mathematical and statistical literature.

Figure 1. Modelling process for case study.

B. Colin et al.

DOI:

10.4236/ojs.2017.75061 862 Open Journal of Statistics

2.1. Case Study

The study area is located in the Northern Territory, Australia. The main climate

zone is identified as grassland with hot dry summers and mild winters [9]. It is a

heterogeneous region with a complex topography and land cover and type of

grassland. Identification, differentiation and quantitative estimation of biomass

is of primary interest in this case study. A range of data from different sources

was required for this problem. In this section, we describe the information de-

rived from Landsat imagery and comment briefly on other data. The reflectance

recorded by the Landsat sensor is stored as an 8 bit value, resulting in a scale of

256 different grey values ranging from black (0 max absorption) to white (255

max reflection). The electronically recorded data appear as an array of numbers

in digital format. In addition to the 8 bit quantisation, Landsat offers several

spectral bands in the electromagnetic and infrared spectrum in which each indi-

vidual pixel shows different values across different bands. This means that each

pixel has a different dimension and therefore will be represented differently in

each spectral band. Raster data are becoming increasingly common and increa-

singly large in volume, although it is possible to reduce file size with compres-

sion functions.

There is a strong advantage in using remotely sensed Landsat imagery and

applied spectroscopy for these types of analyses because the data are freely

available, the imagery covers a wide geographical range, and it avoids expensive,

extensive and often impractical in-situ measurement. However, the trade-off is

in resolution: in-situ measurements provide highly localised accuracy whereas a

pixel in a Landsat image covers an area of 30 × 30 meters. It is noted that other

satellites are now able to provide higher resolution, but these are not yet freely

available for the areas of interest in this case study.

Estimation of biomass using satellite data is of ongoing global interest. Grass

biomass estimation is challenging since the phenological growing cycle of natu-

rally existing grass is a dynamic process influenced by many complex parame-

ters, including grass type, soil, climate, topography and land use. With the spec-

tral information of remotely sensed imagery it is possible to detect green vegeta-

tion, which is driven by the photosynthetic biochemical process of grass bio-

mass. However, since raster imagery is only a two dimensional representation of

the land cover it is difficult to derive the quantity of the vertical grass biomass

directly.

Fractional cover [10] data are often available as derived products; for example

Geoscience Australia (GA) who provides an Australian Reflectance Grid 25

(ARG25) product which gives a 25 meter scale fractional cover representation of

underlying vegetation across Australia or Tern - Auscover in 30 meter resolution

of Landsat 5 and 7 covering the temporal extent from 2000-2011. Fractional

cover unmixing algorithms use the spectral reflectance of a Landsat scene for a

pixel to break it into three fractions represented as percentage values. These are

photosynthetic vegetation (includes leaves and grass), non-photosynthetic vege-

剩余16页未读，继续阅读

weixin_38623366

粉丝: 4
资源: 932

会员权益专享

论文研究 - 使用增强的回归树和遥感数据来推动决策

评论0

会员权益专享

最新资源

论文研究 - 使用增强的回归树和遥感数据来推动决策

评论0

matlab开发-增强的二元回归树

时空加权回归模型ARCGIS安装包

R语言 随机森林回归

论文研究-基于决策树算法的遥感图像分类研究与实现.pdf

论文研究-基于决策树分类器遥感影像植被分类方法研究 .pdf

海洋遥感h1-c数据据处理

决策树遥感分类matlab

viirs-npp夜间灯光遥感数据下载和预处理

遥感数据与非遥感数据的复合步骤

如何将FY-3D遥感影像与era5数据集进行空间匹配

研究遥感数据质量的影响因素的背景和意义

遥感数据处理matlab

tank 遥感数据集

你可以写出pie-engine中修复landsat7遥感影像的代码吗

python处理遥感数据

python遥感数据处理怎么学习

gdal 处理遥感数据

mf-我的理解_python遥感降水校正_卫星降水_降尺度_

如何使用植被光谱数据去验证遥感影像数据计算的NDVI的产品精度

怎么用遥感数据提取黄河三角洲地区的植被指数和地表温度数据

会员权益专享

最新资源

R语言随机森林回归