Team # 2115252 Page 5 of 25
2.Date and latitude and longitude of all positive reports, calculate the distance between these
locations and the source of propagation. If the distance image is drawn at intervals of one day,
then the derivative of the line between two adjacent valid data points should be always greater
than zero or always less than zero.
3.According to the available data, for all positive reports, the location corresponding to the
earliest date is determined to be the source of transmission, and there should be no positive
reports earlier than this report.
4.The positive identification result model analysis of all unverified samples found that almost no
samples in this part will be re-identified as positive. Therefore, the positive data that may exist in
the unverified samples are ignored. We only based on the existing known samples of Asian The
giant hornet makes propagation predictions.
1.4.2 Hypothesis of classification model based on convolutional neural network (CNN)
In the file 2021MCM ProblemC Images by GlobalID.xlsx and the file 2021 MCM ProblemC
DataSet.xlsx, we found that only reports containing image information can give Lab comments.
That is to say, although some witnesses have submitted reports to the laboratory, which do not
contain image information, the laboratory is unable to judge whether these reports describe the
Asian giant hornets or not. Therefore, in this model, we only consider reports that contain image
information that can be judged, and consider that reports that do not contain image information
are invalid data. Since the report provided by the witnesses to the laboratory
contains .jpg, .png, .mp4, and. video files, the .jpg file occupies the vast majority and only
contains one .mp4 and. video file, so this model is only for. jpg file, other files can be ignored
due to too few.
2 Problem analysis
2.1 Task1
Problem one requires us to analyze the spread of Asian giant hornets over time. From the
information in the task, we can observe that the distribution of hornets varies with time, and the
position information this time is relative to the position information next time. Therefore, Time-
Series Analysis can be used properly. After judgment, the data is stable, so the ARIMA model
can be used to solve the problem.
2.2 Task2
Problem two requires us to build a model of the likelihood of classification errors. In order to
achieve the purpose, we use Python to match the positive ID and negative ID in the data set file
one by one to construct the training set and the test set. At the same time, the data is filtered to
remove unprocessed corresponding pictures and other non-picture files. What’s more, we use the
training set to construct and train the h5 model, after that, we can draw conclusions easily.
2.3 Task3
Problem three requires us to build a model to solve the problem of how to determine the report as
a positive identification. As a result, we introduced the AHP model to combine quantitative