Take-home Exercise 3 - Exploratory Data Analysis

Author

Seng Jing Yi

Published

May 18, 2024

Modified

June 9, 2024

Exploring Vast Challenge 2024 - Mini-challenge 2

1. Introduction:

1.1 Setting the scene:

FishEye has learned that SouthSeafood Express Corp has been caught fishing illegally. The scandal caused a major disruption in the close-knit fishing community. FishEye has been collecting data on ship movements and shipping records in hopes that they could assemble a cohesive store of knowledge that will allow them to better understand local commercial fishing behavior.

1.2 Key Objectives of this assignment:

FishEye analysts need help to perform geographic and temporal analysis of the CatchNet data so they can prevent illegal fishing from happening again. This assignment will focus on sub-questions 1 & 4, namely:

Sub-qn Objectives Techniques Used
1 Visualization system to associate vessels with their probable cargos, including seasonality analysis to detect any anomalies in port exit records.

Visualising geographical data

Visualisation with Treemap

4 Visualizing changes in fishing activities after SouthSeafood Express was caught.

Analysing time series

Visual multi-variate analysis with parallel coordinates plot

2. Data Import & Processing

2.1 Loading relevant packages and data

Packages Used Purpose
jsonlite, sf Importing JSON file and geojson for geographical data
tidyverse, lubridate, dplyr, readr, reshape2, data.table Data wrangling and reshaping in preparation for visualisation and wrap text function
ggplot2, tidygraph, patchwork, sf, ggrepel Statistical graph and geographical plots with points and polygons and tidying of graphs for clarity and aesthetics
DT, igraph, ggraph, plotly, visNetwork, ggiraph For interactive graph and data table for user to drill down on area of interest
Code
# load package
pacman::p_load(jsonlite, tidyverse, tidyr, 
               knitr, lubridate, dplyr, 
               igraph, ggraph, ggdist, ggplot2, 
               sf, tidygraph, reshape2, readr,
               DT, patchwork,plotly, data.table, ggrepel)

# load data
mc2_data <- fromJSON("data/mc2.json")

2.2. Data Cleaning

Reference: Kickstarter 2 by Prof Kam

2.2.1 Extracting edges & understanding edge tibble data table.

  1. Extracting JSON file edges to tibble data frame and removing duplicates
  2. Converting to the correct data types for datetime
  3. Renaming columns starting with “_ xx_” to minimize downstream syntax errors.
  4. Creating subsets of data tables based on column of “type”
  5. Renaming of columns to provide context and unique identifiers for downstream data mapping.
  6. Specifically for tx_c, adjustment of transaction date, T0 to T-1 as fish import generally leave harbor one day after delivery
  7. Resultant data sets include:
Resultant Edges Data Set Unique columns
Transactions (i.e., tx_c) Cargo_id, Destination, Transaction date, Fish species
Transponder Ping (i.e., E_Tping_c) Start_time, Dwell, Ping source, Vessel_id
Habor Arrival Report (E_Hbrpt_c) Vessel_id, Port, Key, Arrival date, Port master, Aphorism, Holiday Greeting, Saying of the Sea, Wisdom
Code
#assigning to mc2_edges2
mc2_edges <- as_tibble(mc2_data$links) %>% 
  distinct() 

#correcting data type - converting to date format
mc2_edges$time <- as_datetime(mc2_edges$time)
mc2_edges$"_last_edited_date" <- as_datetime(mc2_edges$"_last_edited_date")
mc2_edges$"_date_added" <- as_datetime(mc2_edges$"_date_added")
mc2_edges$date <- as_datetime(mc2_edges$date)


#renaming headers with "_" to prevent errors. 
mc2_edges <- mc2_edges %>%
  rename("last_edited_by" = "_last_edited_by",
         "date_added" = "_date_added",
         "last_edited_date" = "_last_edited_date",
         "raw_source" = "_raw_source",
         "algorithm" = "_algorithm") 

#checking data format
#glimpse(mc2_edges)

# Breaking into subsets based on event category
E_TransponderPing <- subset(mc2_edges,  mc2_edges$type == "Event.TransportEvent.TransponderPing")
E_HarborRpt <- subset(mc2_edges,  mc2_edges$type == "Event.HarborReport")
E_Tx <- subset(mc2_edges, mc2_edges$type == "Event.Transaction")

# Dropping columns that are NULL and renaming variables to separate them

# Transactions
E_Tx_c <- E_Tx %>%
  rename(
  cargo_id = source, 
  dest = target,
  tx_date = date) %>%
  mutate(tx_date = tx_date -1) %>% # adjustment for records
  select(-c(key, algorithm, `raw_source`, `type`, `data_author`, `aphorism`, `holiday_greeting`, `wisdom`, `saying of the sea`, `time`, `dwell`)) 

# Separating the fish species for the respective cargo 
tx_sub1 <- E_Tx_c[grep("^City of", E_Tx_c$dest), ]
tx_sub2 <- E_Tx_c[!grepl("^City of", E_Tx_c$dest), ]
tx_sub2 <- tx_sub2 %>% rename(fish_species = dest)

tx_c <- left_join(tx_sub1, tx_sub2 %>% select(cargo_id, fish_species), by = "cargo_id")

# Dropping raw source: All Oceanus Centralized Export/Import Archive and Notatification Service (OCEANS)
# Dropping algorithm: CatchMate ('arrrr' edition)
# Null columns - Data author, aphorism, holiday_greeting, wisdom, saying of the sea, time, dwell

# Transponder Ping
E_Tping_c <- E_TransponderPing %>%
  rename(vessel_id = target, 
         ping_source = source,
         start_time = time) %>%
  select(-c(key, `algorithm`, `raw_source`, `type`, `date`, `data_author`, `aphorism`, `holiday_greeting`, `wisdom`, `saying of the sea`)) 

#Dropping raw_source: All Oceanus Vessel Locator System
#Dropping algorithm: All OVLS-Catch&Hook
# Null columns - Date, Data author, aphorism, holiday_greeting, wisdom, saying of the sea

# Habour Report
E_Hbrpt_c <- E_HarborRpt %>% rename(
  vessel_id = source, 
  port = target, 
  arr_date = date, 
  port_master = data_author, 
  saying = `saying of the sea`
) %>%
  select(-c(`algorithm`, `type`, `time`, `dwell`)) 

#Dropping algorithm: All HarborReportMaster 3.11
#Retain raw_source: Differing values depending on which Port / City

rm(tx_sub1, tx_sub2, E_TransponderPing,  E_HarborRpt, E_Tx, mc2_edges, E_Tx_c)

2.2.2 Extracting nodes & understanding nodes tibble data table.

Repeating similar steps for nodes records, we obtained the following resultant data set. To minimize the number of data sets, we appended nodes information on vessels and region into a combined data set, with an assigned label to identify it’s category. (e.g., vessel_type)

Resultant Nodes Data Sets Unique Columns
N_fish (Fishes and their description) Fish_id, fish species
N_Delivery_Doc (Cargo and details) Cargo_id, quantity in tons, delivery date
N_vessel (Vessels and their description) Vessel_id, vessel name, vessel company, flag country, tonnage, overall length, vessel type
Location_legend (Point, City, Region) Area, activities, kind, fish_species
Code
#segmenting nodes data and checking for distinct records
mc2_nodes <- as_tibble(mc2_data$nodes) %>%
  distinct()

#renaming to remove the "_" 
mc2_nodes <- mc2_nodes %>%
  rename("last_edited_by" = "_last_edited_by",
         "date_added" = "_date_added",
         "last_edited_date" = "_last_edited_date",
         "raw_source" = "_raw_source",
         "algorithm" = "_algorithm") 

#tidying the text data to remove nested list.

mc2_nodes_tidied <- mc2_nodes %>%
  mutate(Activities = gsub("c[(]", "", Activities)) %>% 
  mutate(Activities = gsub("\"", "", Activities)) %>%
  mutate(Activities = gsub("[)]", "", Activities)) 

mc2_nodes_tidied <- mc2_nodes_tidied %>%
  mutate(fish_species_present = gsub("c[(]", "", fish_species_present)) %>% 
  mutate(fish_species_present = gsub("\"", "", fish_species_present)) %>%
  mutate(fish_species_present = gsub("[)]", "", fish_species_present)) 

# Creating subset on nodes information
N_fish <- subset(mc2_nodes_tidied,  mc2_nodes_tidied$type == "Entity.Commodity.Fish") %>%
  select_if(~ !any(is.na(.))) %>% 
  select(-c(`type`, `raw_source`, `algorithm`, `Activities`, `fish_species_present`)) %>%
  rename(fish_species = name, 
         fish_id = id)

NL_City <- subset(mc2_nodes_tidied,  mc2_nodes_tidied$type == "Entity.Location.City") %>%
  select_if(~ !any(is.na(.))) %>%
  select(-c(`raw_source`, `algorithm`, `type`, `fish_species_present`)) %>%
  rename(city_name = Name, 
         city_id = id)


NL_Point <- subset(mc2_nodes_tidied,  mc2_nodes_tidied$type == "Entity.Location.Point") %>%
  select_if(~ !any(is.na(.))) %>%
  select(-c(`raw_source`, `algorithm`, `kind`, `fish_species_present`)) %>%
  rename(point_name = Name, 
         point_id = id)

## Need to tidy NL Region
NL_Region <- subset(mc2_nodes_tidied,  mc2_nodes_tidied$type == "Entity.Location.Region") %>%
  select_if(~ !any(is.na(.))) %>%
  select(-c(`raw_source`, `algorithm`, `type`, `Description`)) %>%
  rename(region_name = Name, 
         region_id = id, 
         region_kind = kind)

N_Delivery_doc <- subset(mc2_nodes_tidied,  mc2_nodes_tidied$type == "Entity.Document.DeliveryReport") %>%
  select_if(~ !any(is.na(.))) %>%
  rename(deliver_date = date,
         cargo_id = id) %>%
  select(-c(`algorithm`, `type`, `raw_source`, `Activities`, `fish_species_present`)) 

## consider adding back more columns, it dropped some columns where values were partial NA

N_vessel <- mc2_nodes_tidied %>%
  filter(grepl("Entity.Vessel", type)) %>%
  mutate(vessel_type = case_when(
    grepl("FishingVessel", type, ignore.case = TRUE) ~ "Fishing",
    grepl("Ferry.Passenger", type, ignore.case = TRUE) ~ "Ferry_Passenger",
    grepl("Ferry.Cargo", type, ignore.case = TRUE) ~ "Ferry_Cargo",
    grepl("Research", type, ignore.case = TRUE) ~ "Research", 
    grepl("Other", type, ignore.case = TRUE) ~ "Other", 
    grepl("Tour", type, ignore.case = TRUE) ~ "Tour", 
    grepl("CargoVessel", type, ignore.case = TRUE) ~ "Cargo_Vessel"
    )) %>% 
  mutate(company = ifelse(is.na(company), "Unknown", company)) %>% # Handle NA values by replacing NA with unknown
  select(-c(`algorithm`, `type`, `raw_source`, `Activities`, `fish_species_present`, `Description`, `kind`, `style`, `name`, `qty_tons`,`date`)) %>%
  rename(vessel_id = id, 
         vessel_name = Name, 
         vessel_company = company)


# Further exploring records where there is null values
partial_na_records <- N_vessel[!complete.cases(N_vessel), ] %>% 
    select(-c(last_edited_by, date_added, last_edited_date)) 

partial_na_sum <- partial_na_records %>% 
  group_by(vessel_type) %>% 
  summarize(count = n()) 

# Plot NA records
partial_na_plot <- ggplot(partial_na_sum, aes(x = vessel_type, y = count)) + 
  geom_bar(stat = "identity", fill = "grey") + 
  geom_text(aes(label = count), vjust = -0.2, size = 4) + 
  labs(title = "Breakdown of vessel type for partial NA records", 
       x = "Vessel Type",
       y = "Count of vessel with NA") + theme_minimal(base_size = 12) + 
  theme(axis.line = element_line(color = "black"))

partial_na_plot

Code
# Display partial na records

datatable(partial_na_sum, options = list(pageLength = 5)
          , filter = "top")
Code
# merging ping with location details (region_id, city_id, point_id)
## See how to deal with list within activites, and how to include region_kind
## See if want to retain some description to identify point, region vs city

city_legend <- NL_City %>% 
  select(c(`city_id`, `Activities`, `kind`)) %>%
  mutate(fish_species_present = "NA") %>%
  rename(area = city_id)

point_legend <- NL_Point %>% 
  select(c(`point_id`, `Activities`))  %>%
  mutate(kind = "point", 
         fish_species_present = "NA") %>%
  rename(area = point_id)

region_legend <- NL_Region %>% 
  select(c(`region_id`, `Activities`, `region_kind`, `fish_species_present`))  %>%
  rename(area = region_id, kind = region_kind)

location_legend <- rbind(city_legend, point_legend, region_legend) 

write_csv(N_vessel, "data/N_vessel.csv")
write_csv(location_legend, "data/location_legend.csv")

#dropping unnecessary tables
rm(mc2_data, mc2_nodes_tidied, partial_na_records, city_legend, point_legend, region_legend, partial_na_sum, partial_na_plot, mc2_nodes)

Insights:

  • Total of 18 vessels with partial NA
  • All vessel company is “Unknown”, belonging to Oceanus (flag_country = Oceanus)
  • All 18 vessels fall under non fishing and non cargo_vessel type, hence, they will be excluded from our analysis.

2.2.5 Merging back the data after processing.

To incorporate context of the nodes details into the various edges, the related description of the nodes were appended to edge data sets. This helped to streamline the records into 3 consolidated data sets.

Consolidated Data Set Resultant Edge Data Set Mapped with Nodes Data Set
Transaction (with cargo weight) tx_c: Cargo_id, Destination, Transaction date, Fish species N_delivery_doc: Cargo_id, quantity in tons, delivery date
Ping activity (with vessel details by vessel_id, possible fish caught by location_legend) E_Tping_c: Start_time, Dwell, Ping source, Vessel_id

N_vessel: Vessel_id, vessel name, vessel company, flag country, tonnage, overall length, vessel type

location_legend: Area, activities, kind, fish_species

Habor Arrival Report
(with vessel details)
E_Hbrpt_c: Vessel_id, Port, Key, Arrival date, Port master, Aphorism, Holiday Greeting, Saying of the Sea, Wisdom N_vessel: Vessel_id, vessel name, vessel company, flag country, tonnage, overall length, vessel type
Code
# merging the quantity from delivery_doc with transaction based on cargo id
tx_qty <- left_join(tx_c, N_Delivery_doc %>% select(cargo_id, qty_tons, deliver_date), by = "cargo_id")

# Merging habor report with vessel details to identify type of vessel, tonnage and overall length 

E_Hbrpt_v <- left_join(E_Hbrpt_c, N_vessel %>% select(vessel_id, vessel_name, tonnage, length_overall, flag_country, vessel_company, vessel_type), by = "vessel_id")

# Merging ping transaction with the location type
E_Tping_c2 <- E_Tping_c %>% rename(area = ping_source)
ping_activity <- left_join(E_Tping_c2, location_legend %>% select(area, Activities, kind, fish_species_present), by = "area")
ping_activity <- left_join(ping_activity, N_vessel %>% select(vessel_id, vessel_name, tonnage, length_overall, flag_country, vessel_company, vessel_type), by = "vessel_id")

# removing "City of" from the "area column" 
vessel_movement <- ping_activity %>%
  mutate(area = gsub("^City of", "", area)) %>%
  mutate(area = gsub("^\\s+", "", area))

#write_csv(tx_qty, "data/tx_qty.csv")
#write_csv(ping_activity, "data/ping_activity.csv")
#write_csv(E_Hbrpt_v, "data/hbrpt.csv")

rm(tx_c, E_Tping_c2, E_Tping_c, E_Hbrpt_c)

3. Exploratory Data Analysis

3.0 Understanding areas in Oceanus

Reference: Kickstarter 3 by Prof Kam

Steps taken:

  1. Import ESRI layer including the centriod details of geographical areas.
  2. Extract coordinates from geographical data frame and appending to vessel movement data frame to plot the movement of vessels.
  3. Filter the vessels of interest to the vessels belonging to “South Seafood Express Corp” and plot the routes taken by the 2 vessels, namely ““Snappersnatcher7be”, “Roachrobberdb6”.

Visualisation improvement:

  1. Color coding the regions for better identification of the type of region. (e.g., Beige for island, blue for fishing region and green for ecological reserves).

Insights from South Seafood Express Corp Vessels:

  • For “Snappersnatcher7be”, common cities visited are City of Lomark, City of Packland, with legal fishing area of Wrasse Beds and Cod Table and possible illegal fishing activity at Ghoti Reserve.

  • For “Roachrobberdb6”, common cities visited are City of Himark and City of Lomark, with legal fishing area of Wrasse Beds, and potential illegal fishing activity at Ghoti Reserve.

Code
oceanus_geog = st_read("data/Oceanus Geography.geojson") %>% st_transform(crs = 4326)
Reading layer `Oceanus Geography' from data source 
  `C:\sengjingyi\ISSS608\Take-home_Ex\Take-home_Ex03\data\Oceanus Geography.geojson' 
  using driver `GeoJSON'
Simple feature collection with 29 features and 7 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: -167.0654 ymin: 38.07452 xmax: -163.2723 ymax: 40.67775
Geodetic CRS:  WGS 84
Code
# basic plot of underlying geography (polygons, points)
# ggplot(data = oceanus_geog) + geom_sf()
# write_rds(oceanus_geog, "data/rds/OceanusGeography.rds")

OceanusLocations <- st_read(dsn = "data/shp",
  layer = "Oceanus Geography")
Reading layer `Oceanus Geography' from data source 
  `C:\sengjingyi\ISSS608\Take-home_Ex\Take-home_Ex03\data\shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 27 features and 7 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -167.0654 ymin: 38.07452 xmax: -163.2723 ymax: 40.67775
Geodetic CRS:  WGS 84
Code
write_rds(OceanusLocations, 
  "data/rds/OceanusLocations.rds")

# extract coordinates from df
coords <- st_coordinates(OceanusLocations)

# drop geometry columns
OceanusLocations_df <- OceanusLocations %>%
  st_drop_geometry()

# append x and y coordinates into df as columns
OceanusLocations_df$XCOORD <- coords[, "X"]
OceanusLocations_df$YCOORD <- coords[, "Y"]

# tidy df by renaming column 
OceanusLocations_df <- OceanusLocations_df %>%
  select(Name, X.Kind, XCOORD, YCOORD) %>%
  rename(Loc_Type = X.Kind)

# left join to append back to vessel movement 

vessel_movement <- vessel_movement %>%
  left_join(OceanusLocations_df,
            by = c("area" = "Name"))

# save file as vessel_movement_data.data.frame

write_rds(vessel_movement, "data/rds/vessel_movement_data.rds")

# convert vessel movement data.frame into sf point data.frame 
vessel_movement_sf <- vessel_movement %>%
  st_as_sf(coords = c("XCOORD", "YCOORD"), 
           crs = 4326)

# arrange record based on vessel name and navigation time 
vessel_movement_sf <- vessel_movement_sf %>%
  arrange(vessel_id, start_time)

# convert vessel movement sf from point into linestring features known as vessel trajectory
vessel_trajectory <- vessel_movement_sf %>%
  group_by(vessel_id) %>%
  summarize(do_union = FALSE) %>%
  st_cast("LINESTRING")

## include placeholder for vessel of interest and colors assigned

vessels_of_interest <- c("snappersnatcher7be", "roachrobberdb6")
vessel_colors <- c("snappersnatcher7be" = "red", "roachrobberdb6" = "purple")

# creating route for selected vessel 
vessel_trajectory_selected <- vessel_trajectory %>%
  filter(vessel_id %in% vessels_of_interest)

# defining colors for X.kind

kind_colors <- c(
  "Island" = "beige", 
  "Fishing Ground" = "lightblue", 
  "Ecological Preserve" = "lightgreen", 
  "city" = "purple", 
  "buoy" = "blue")

ggplot() +
  geom_sf(data = oceanus_geog, aes(fill = X.Kind), color = "black") +
  scale_fill_manual(values = kind_colors) + 
  geom_sf(data = vessel_trajectory_selected, 
          aes(color = vessel_id), 
          size = 1) +
  scale_color_manual(values = vessel_colors) + 
  geom_text(data = OceanusLocations_df, 
            aes(x = XCOORD, y = YCOORD, label = Name), 
            size = 2, hjust = 1, vjust = 1) +
  theme_minimal() +
  labs(title = "Trajectories of South Seafood Express Corp", 
       x = "Longitude", y = "Latitude", color = "Vessel ID")

3.1 Understanding possible fish species and signs of illegal fishing

Steps taken:

  1. Unlist the fish species appended in the initial node of region and spread it across the table to form a matrix to identify the possible fish species identified in each region.
  2. Contrast the fish species from region identified with the fish species caught based on the cargo transactions to detect any deviation that requires further investigation.
  3. Plot a bar graph to visualisation the

Visualisation improvement:

  1. Color coding the fish species representation for easier identification of illegal fish species in red and commonly caught fish species in blue, with additional fish species of “Salmon” in yellow for further investigation.
  2. Factoring the fish species in order to reorder the layers of the stacked bar graph such that illegal fish species are clustered together at the base, followed by regular fish species and lastly the unexpected fish species at the top.
  3. Factoring the destination city such that the graph is ordered in descending order based on transaction quantity in tons.
  4. Including variable time frame (with earliest_date and latest_date) for selection of period of interest when looking at the summary of cargo of interest.

Insights

  • 3 fish species are only present in ecological reserves: (1) Offidiaa/Piscis osseus, (2) Sockfish/Pisces foetida and (3) Helenaa/Pisces satis.

  • Hence, any cargo with these fish species have likely violated fishing regulations and fished in ecological reserves. (1) Don Limpet Preserve, (2) Ghoti Preserve and (3) Nemo Reefs.

  • Additional species of “oncorhynchusrosea790” identified with the cargo transactions. Referencing the internet fish species, this refers to the commonly known species of “Salmon”.

Code
# Formatting region data to identify fish type in region
region_species <- NL_Region %>%
  mutate(fish_species_present = gsub('c\\(|\\)|"', "", fish_species_present), 
    fish_species_present = strsplit(as.character(fish_species_present), ", ")) %>%
  unnest(fish_species_present) %>%
  mutate(presence = 1) %>%
  spread(key = fish_species_present, value = presence, fill = 0)

region_species_c <- region_species %>% 
  select( -region_id, -last_edited_by, -last_edited_date, -date_added)

kable(region_species_c)
region_name Activities region_kind Beauvoir/Habeas pisces Birdseye/Pisces frigus Cod/Gadus n.specificatae Harland/Piscis sapidum Helenaa/Pisces satis Offidiaa/Piscis osseus Sockfish/Pisces foetida Tuna/Thunnini n.vera Wrasse/Labridae n.refert
Don Limpet Preserve Recreation, Tourism Ecological Preserve 1 1 0 0 1 0 1 1 0
Cod Table Commercial fishing Fishing Ground 1 1 1 0 0 0 0 0 0
Tuna Shelf Commercial fishing, Sport fishing Fishing Ground 1 1 0 1 0 0 0 1 0
Ghoti Preserve Research, Tourism, Recreation Ecological Preserve 1 0 0 0 1 1 0 0 1
Nemo Reef Recreation, Tourism Ecological Preserve 1 1 0 0 1 0 0 1 1
Wrasse Beds Commercial fishing Fishing Ground 1 1 0 0 0 0 0 0 1
Code
# comparing the list of unique fish species in the tx_qty
unique_fish_cargo <- unique(tx_qty$fish_species)

# unique_fish_cargo has additional species of salmon - oncorhynchusrosea790

#Aligning the naming convention for fish species

fish_species_labels <- c(
  "gadusnspecificatae4ba" = "Cod/Gadus n.specificatae", 
  "piscesfrigus900" = "Birdseye/Pisces frigus", 
  "piscesfoetidaae7" = "Sockfish/Pisces foetida", # illegal
  "labridaenrefert9be" = "Wrasse/Labridae n.refert", 
  "habeaspisces4eb" = "Beauvoir/Habeas pisces", 
  "piscissapidum9b7" = "Harland/Piscis sapidum", 
  "thunnininveradb7" = "Tuna/Thunnini n.vera", 
  "piscisosseusb6d" = "Offidiaa/Piscis osseus", # illegal
  "piscessatisb87" = "Helenaa/Pisces satis" # illegal
)

## assign specific colors to fish species, red for illegal. 

fish_species_color <- c("piscesfoetidaae7" = "#FF6666", 
                        "piscisosseusb6d" = "#FF9999", 
                        "piscessatisb87" = "#FFCCCC", 
                        "gadusnspecificatae4ba" = "#CCE5FF", 
                        "piscissapidum9b7" = "#99CCFF", 
                        "habeaspisces4eb" = "#66B2ff", 
                        "piscesfrigus900" = "#CCE5FF", 
                        "oncorhynchusrosea790" = "#FFFF99", 
                        "labridaenrefert9be" = "#99CCFF", 
                        "thunnininveradb7" = "#66b2ff"
                        )

# include paramters for users to change for timeframe
tx_qty$tx_date <- as.Date(tx_qty$tx_date, format = "%Y-%m-%d")

earliest_date <- min(tx_qty$tx_date, na.rm = TRUE)
latest_date <- max(tx_qty$tx_date, na.rm = TRUE)

## filtering the data set of interest 

tx_qty_of_interest <- tx_qty %>% 
  filter(tx_date >= earliest_date & tx_date <= latest_date)

# summarise total tons of fish per location
total_qty_tons_per_dest <- tx_qty_of_interest %>%
  group_by(dest, fish_species) %>%
  summarize(total_qty_tons = sum(qty_tons, na.rm = TRUE)) %>%
  ungroup()

# reordering levels for fish species for tidier plot
total_qty_tons_per_dest$fish_species <- factor(
  total_qty_tons_per_dest$fish_species, 
  levels = c("gadusnspecificatae4ba", "piscissapidum9b7", "habeaspisces4eb", 
             "piscesfrigus900", "labridaenrefert9be", "thunnininveradb7", 
             "oncorhynchusrosea790", # unidentified - Salmon
             "piscesfoetidaae7","piscisosseusb6d", "piscessatisb87" )) #illegal

# reordering levels to arrange bars in descending order 
total_qty_tons_per_dest$dest <- factor(
  total_qty_tons_per_dest$dest, 
  levels = c("City of Paackland", "City of Himark", "City of Lomark", "City of South Paackland", "City of Haacklee" )) 


# plot identifying occurence of illegal fish species at various port - identifying all unique fish species in cargo report (tx_qty)

p_qty_dest <- ggplot(total_qty_tons_per_dest, 
       aes(x = dest, y = total_qty_tons, fill = fish_species)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = fish_species_color,  labels = fish_species_labels) +
  labs(title = "Total Quantity of Fish by Destination and Species",
       x = "Destination",
       y = "Total Quantity (tons)",
       fill = "Fish Species") +
  theme_minimal(base_size = 8) +
  theme(axis.line = element_line(color = "black"))

p_qty_dest

Work-in-progress: Attempt to plot the type of fish species location of origin and the port received based on cargo transaction. Currently missing overlap of geographical point and polygons.

Code
# Network analysis to see mapping of fish species delivered to which port at what quantity

edges <- total_qty_tons_per_dest %>%
  select(from = fish_species, to = dest, weight = total_qty_tons, dest = dest)

# Create graph from data frame
graph <- graph_from_data_frame(edges, directed = FALSE)

# Convert to tbl_graph object
graph_tbl <- as_tbl_graph(graph)

# Plot the network graph using ggraph
fish_dest_map <- ggraph(graph_tbl, layout = 'nicely') + 
  geom_edge_link(aes(width = weight), alpha = 0.8) +
  geom_node_point(aes(color = ifelse(name %in% edges$from, "Fish Species", "Destination")), size = 2) +
  geom_node_text(aes(label = name), vjust = 1.5, hjust = 1.5, size = 2)+
  scale_edge_width(range = c(0.5, 5)) +
  scale_color_manual(values = c("Fish Species" = "lightblue", "Destination" = "beige")) +
  theme_void(base_size = 8) +
  labs(title = "Network Analysis of Fish Species per Destination", color = "Node Type", edge_width = "Total Quantity (tons)") + 
  facet_edges(~dest) + th_foreground(foreground = "grey",border = TRUE) + theme(legend.position = "bottom")

fish_dest_map

Code
rm(edges, graph, graph_tbl, fish_dest_map)

Steps taken:

  1. Plot combined box plot, stats_dot to highlight the quantity of cargo receive per fish species per destination with facet_grid( ~dest).

Visualisation improvement

  1. Highlight abnormal cargo records where quantity <0 and blue line to show reference of 0 tonnes.
  2. Implement facet_grid to show the reference of fish species received across various ports.

Insights:

  1. Cargo with quantity <0 are scattered across the city and fish species.

  2. For certain ports (e.g., City of Haacklee and City of Himark), there are absence of cargo of certain fish species. For “thunniniveradb7”, the first is only offloaded at City of Haacklee and City of South Paackland.

Code
ggplot(tx_qty, aes(x = fish_species, y = qty_tons)) + 
  geom_boxplot(
    width = .15, 
    ## remove outliers
    outlier.color = NA ## `outlier.shape = NA` or `outlier.alpha = 0` works as well
  ) +
  ## add dot plots from {ggdist} package
  ggdist::stat_dots(
    ## orientation to the right
    side = "right", 
    justification = -0.2,
    ## adjust grouping (binning) of observations 
    binwidth = 1, 
    dotsize = 0.1
  ) +
  ## add highlighted dots where qty_tons <= 0
  ggdist::stat_dots(
    data = subset(tx_qty, qty_tons <= 0),
    side = "right",
    justification = -0.2,
    binwidth = 1,
    dotsize = 0.1,
    color = "red"
  ) +
  ## add horizontal line at qty_tons = 0
  geom_hline(yintercept = 0, linetype = "dashed", color = "blue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Quantity of Fish Species by Weight (Tons)",
       x = "Fish Species",
       y = "Quantity (Tons)") +
  theme(axis.text.x = element_text(hjust = 1, size = 7),
    axis.text.y = element_text(size = 7),
    plot.title = element_text(size = 12, face = "bold"),
    strip.text = element_text(size = 7),
    plot.margin = unit(c(1, 1, 1, 1), "mm"), 
    panel.spacing = unit(0.5,"lines")) + 
  facet_grid(. ~ dest)

3.2 Understanding Ownership of Vessels

Steps taken:

  1. Group vessels by company and type to count the number of vessel type per company.
  2. Arrange the calculated table by descending order of vessels.

3.2.1: Ownership of “Fishing Vessels” largely belonging to “Oceanus”

Steps taken:

  1. Focusing on the key categories of “Fishing”, plot a bar chart on the count of company that owns the “x” no. of vessels.
  2. Hence we will visualise the mapping of company to vessels for company that owns 4 or more vessels. (“No of vessels of interest” is set to adjustable threshold)

Visualisation Improvements:

  1. Summed the count of known companies with “Fishing Vessels”
  2. Highlighted the names of companies with 4 or more fishing vessels, likely to be larger companies, based on adjustable parameter of no_of_interest.
  3. Mapped the name of the fishing vessels associated with these companies of interest.

Insights:

  • For Cargo_vessels, 99 counts of vessels with “Unknown” company, only 1 “Cargo vessel” with known company of “Saltwater Sisters Company Marine”.

  • Only Saltwater Sisters Company Marine has 2 types of vessel, vessels of other vessel_types (Tour, Research, Other) belong to “Unknown” companies.

  • All other known companies own fishing vessels, where 45 companies own 1 vessel, 36 companies own 2, 13 companies own 3 and 5 companies own 4 or more.

Code
vessel_per_type_company <- N_vessel %>% 
  group_by(vessel_company, vessel_type) %>% 
  summarise(vessel_type_count = n()) %>% 
  arrange(vessel_type_count)

vessel_type_company <- vessel_per_type_company %>% 
  group_by(vessel_company) %>% 
  summarise(vessel_count = n()) %>% 
  arrange(vessel_count)

# expose datatable on the count of vessel type per company. 

datatable(vessel_per_type_company, 
          options = list(pageLength = 5), 
          filter = "top")
Code
# Hence for plot, we will focus on vessel_type = "fishing" and "cargo_vessel"

fish_vessel <- vessel_per_type_company %>% filter(vessel_type == "Fishing") %>% as.data.table
fish_vessel_sum <- fish_vessel[, .(company_count = .N, company_names = toString(vessel_type_count)), by = vessel_type_count]

#enforcing all x axis values for clearer depiction by introducing breaks
fish_v_count_range <- seq(min(fish_vessel_sum$vessel_type_count), max(fish_vessel_sum$vessel_type_count))

# introduce wrap text function to limit company name within the column
wrap_text <- function(text, width = 15) {
  sapply(text, function(x) {
    paste(strwrap(x, width = width), collapse = "\n")
  })
}

# applying to column
fish_vessel_sum$wrapped_company_names <- wrap_text(fish_vessel_sum$company_names)

# creating plot
company_vessel_count <- ggplot(fish_vessel_sum, aes(x = vessel_type_count, y = company_count)) +
  geom_bar(stat = "identity", fill = "#00CCCC", color = "black") +
  geom_text(aes(label = company_count), vjust = -0.5, size = 3) +
  scale_x_continuous(breaks = fish_v_count_range) +
  labs(title = "No. of fishing vessels Owned by companies",
       x = "Number of Vessels Owned",
       y = "Count of Companies") +
  theme_minimal(base_size = 8) + 
  theme(axis.line = element_line(color = "black"))+ 

# including annotation ()
 annotate("rect", xmin = 3.5, xmax = 6.5, ymin = 0, ymax = Inf, 
           alpha = 0, color = "#FFBF00", fill = NA) +

 annotate("text", x = 6, y = 5, label = "Ritter Ltd", size = 2) +
  annotate("text", x = 4, y = 20, label = "McCormick Group,\nNielsen,\nValentine and Bell,\nRobinson,\nMcMillan and Watson,\nSolis-Lopez", size = 2)
Code
# creating subset
no_of_interest = 4
company_of_interest <- fish_vessel %>% filter(vessel_type_count >= no_of_interest)

int_fish_v_mapping <- N_vessel %>% filter(vessel_company %in% company_of_interest$vessel_company) %>% select(vessel_id, vessel_company, tonnage)

# data wrangling to fit into network graph
edges <- int_fish_v_mapping %>%
  select(vessel_id, vessel_company)

# Create nodes for vessels
nodes <- int_fish_v_mapping %>%
  select(vessel_id, tonnage) %>%
  distinct() %>%
  rename(name = vessel_id) %>%
  mutate(type = "vessel")

# Create nodes for companies
company_nodes <- data.frame(name = unique(int_fish_v_mapping$vessel_company)) %>%
  mutate(type = "company")

# Combine nodes
all_nodes <- bind_rows(nodes, company_nodes)

# Create the graph object using igraph
network <- graph_from_data_frame(d = edges, vertices = all_nodes, directed = FALSE)

# Add tonnage as a vertex attribute, ensuring NA values are handled
V(network)$tonnage <- ifelse(is.na(V(network)$tonnage), 0, V(network)$tonnage)

# Add node type as a vertex attribute
V(network)$type <- all_nodes$type

# Plot the network graph using ggraph
map_vessel_company <- ggraph(network, layout = 'fr') +
  geom_edge_link(aes(edge_alpha = 0.5), show.legend = FALSE) +
  geom_node_point(aes(size = tonnage, color = type), show.legend = TRUE) +
  geom_node_text(aes(label = name), repel = TRUE, size = 2) +
  scale_color_manual(values = c("vessel" = "#00CCCC", "company" = "black")) +
  theme_void(base_size = 8) +
  labs(title = "Network Graph of Vessels and Companies",
       size = "Tonnage",
       color = "Node Type") + theme(legend.position = "bottom")

company_vessel_count | map_vessel_company

Code
# cleaning environment
rm(no_of_interest, company_of_interest, int_fish_v_mapping, edges, nodes, company_nodes, all_nodes, network, map_vessel_company)

3.2.2: Ownership of Cargo Vessels which are largely unknown

Steps taken:

  1. Filter vessels that are unregistered, where vessel_company = “Unknown”, and include detail of country based on their flags
  2. Plot the count of unknown vessels by flag country and generate data table to list these countries
  3. Calculate their median dwell time at each geographical area and highlight the records of vessels with unknown company in “hot pink”.
  4. Return the list of suspicious vessels that have over-stayed (median dwell > 75% per area) and the regions which they over-stayed.

Visualisation Improvements:

  1. Common color scheme included to identify type of vessels, e.g., brown for cargo vessels, and different shades of grey for other types of vessels.
  2. Reordering the bar chart by no. of vessel counts to make it more readable.
  3. Reordering the geographical areas by regions and assigning the common color scheme as background
  4. Limit the y-axis on median dwell time as there are no records from “Unknown” companies beyond this. This allows us to scale appropriately and have clarity on the “Unknown” vessels’ dwell times.

Insights:

  1. Of the vessels from unknown companies, there are no fishing vessels identified. This may be reasonable as only registered vessels with Oceanus should be authorised to fish within Oceanus’ waters.
  2. 27 foreign countries have 2 or more vessels registered as Cargo Vessels within Oceanus vessel records.
  3. While we expect foreign countries cargo vessels to have short dwell time, considering that they are passing by Oceanus waters during their journey to other locations. There are notably long dwell time from certain vessels which are likely to be suspicious as they are over-staying.
Code
# exploring ships that have unknown companies

unknown_v <- N_vessel %>% filter(vessel_company == "Unknown")

total_vessel_count <- unknown_v %>%
  group_by(flag_country) %>%
  summarize(total_vessel_count = n()) %>%
  arrange(desc(total_vessel_count))

# reorder flag_country to ensure "Oceanus" is first, followed by descending order of vessel count
ordered_countries <- c("Oceanus", setdiff(total_vessel_count$flag_country, "Oceanus"))

# Summarizing to see where unknown vessels come from
unknown_v_sum <- unknown_v %>% 
  group_by(vessel_type, flag_country) %>% 
  summarize(vessel_count = n()) %>%
  mutate(
    # Reorder flag_country with Oceanus first and the rest in descending order of vessel count
    flag_country = factor(flag_country, levels = ordered_countries),
    # Reorder vessel_type to ensure "Cargo_Vessel" is at the bottom
    vessel_type = factor(vessel_type, levels = c("Cargo_Vessel", setdiff(unique(vessel_type), "Cargo_Vessel"))))

# reordering variable 

# defining vessel colors
vessel_colors <- c("Cargo_Vessel" = "#994C00", 
    "Ferry_Cargo" = "#C0C0C0", 
    "Ferry_Passenger" = "#E0E0E0",
    "Research" = "#A0A0A0",
    "Tour" = "#606060", 
    "Other" = "#000000")

unknown_dist <- ggplot(unknown_v_sum, 
                       aes(x = flag_country, y = vessel_count, 
                           fill = vessel_type)) +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = vessel_colors) + 
  theme_minimal() +
  labs(title = "Vessel Count by Flag Country and Vessel Type",
       x = "Flag Country",
       y = "Vessel Count",
       fill = "Vessel Type") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 5), 
        axis.line = element_line(color = "black")) 

unknown_dist 

Code
datatable(unknown_v_sum, 
          options = list(pageLength = 5), 
          filter = "top")
Code
# Identifying regions that unknown vessels are at 
unknown_v_list <- unknown_v$vessel_id

# factoring to order the location by defined order
ping_activity$area <- factor(ping_activity$area, levels = c("Nav E", "Nav D", "Nav C" , "Nav B", "Nav A" , "Nav 3", "Nav 2", "Nav 1", "Ghoti Preserve", "Don Limpet Preserve", "Nemo Reef", "Cod Table","Tuna Shelf","Wrasse Beds","Exit East", "City of South Paackland", "City of Paackland","City of Lomark","City of Himark","City of Haacklee","City of Port Grove"))

vessel_dwell <- ggplot(ping_activity, aes(x = area, y = dwell)) +
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 15.5 , xmax = 20.5, alpha = 0.8, fill = "beige") +    # Ports
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 11.5, xmax = 14.5, alpha = 0.5, fill = "lightblue") +  # Fishing ground
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 8.5, xmax = 11.5, fill = "lightgreen", alpha = 0.5) +  # Ecological preserve
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 0.5, xmax = 8.5, alpha = 0.5, fill = "lightgrey") +
  geom_boxplot() +
  geom_point(data = subset(ping_activity, vessel_id == unknown_v_list),
             aes(x = area, y = dwell), color = "hotpink", size = 1) +
  theme_minimal() +
  labs(title = "Median Dwell Time by Area", 
       x = "Area", 
       y = "Median Dwell Time") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),axis.line = element_line(color = "black")) + 
  scale_y_continuous( limits = c(0, 500000)) + #zoom into 2 * 10^5
  coord_flip()

vessel_dwell

Code
# extracting records where vessel_company is unknown and median dwell > 75% of each area 

dwell_stats <- ping_activity %>%
  group_by(area) %>%
  summarize(
    median_dwell = median(dwell, na.rm = TRUE),
    q75_dwell = quantile(dwell, 0.75, na.rm = TRUE)
  ) 

# Join the dwell stats back to the unknown vessels dataset
unknown_vessels_with_stats <- ping_activity %>% 
  filter(vessel_id %in% unknown_v_list) %>% 
  left_join(dwell_stats, by = "area")

# Filter records where dwell time exceeds the 75th percentile, focusing on cargo vessels and country not Oceanus
exceeding_75th_percentile <- unknown_vessels_with_stats %>%
  filter(dwell > q75_dwell, flag_country != "Oceanus", vessel_type == "Cargo_Vessel") %>% 
  distinct(vessel_id, area, flag_country) %>% arrange(area)

datatable(exceeding_75th_percentile, 
          options = list(pageLength = 5), 
          filter = "top")

3.3 Exploring distribution of Vessels by Tonnage

Steps taken:

  1. Plot scatter graph on the Tonnage vs Overall Length of vessels to see any correlation between vessel capacity and vessel size.

  2. Given that most of the large vessels are “Cargo Vessels” and smaller vessels are “Fishing” vessels, we summarised the count of each vessel type in a bar chart.

  3. Plot histogram to count the number of “Fishing” and “Cargo Vessels” by Tonnage and identified 3 clusters of interest.

  4. Next, we explored the median dwell time per area for each of these 3 subset in relation to their counterparts of the same vessel type.

Insights:

  1. Wide variation of vessel size, and this may be linked with Transhipment vessels.
  2. 178 fishing vessels, and 100 cargo vessels, and the remaining 18 vessels of “Research, Tour, Ferry Passenger, Ferry Cargo and Others”.
  3. Based on the distribution of vessel tonnage for “Fishing” and “Cargo Vessels”, we identified 3 subsets of interest.
    1. Subset 1: Unusually large fishing vessels (Tonnage > 12,500)
    2. Subset 2: Substantial group of small cargo vessels (Tonnage < 5000) - Comparable to fishing vessel median tonnage of 4850
    3. Subset 3: Unusually large cargo vessels (Tonnage > 70,000)
Code
# Defining vessel colors 
vessel_colors <- c("Fishing" = "#00CCCC", 
    "Cargo_Vessel" = "#994C00", 
    "Ferry_Cargo" = "#C0C0C0", 
    "Ferry_Passenger" = "#C0C0C0",
    "Research" = "#C0C0C0",
    "Tour" = "#C0C0C0", 
    "Other" = "#C0C0C0")

#unique(N_vessel$vessel_type)

scatter_ton_len <- ggplot(data= N_vessel, 
            aes(x= tonnage, 
                y= length_overall, 
                color= vessel_type)) +
  geom_point() + 
  scale_color_manual(values = vessel_colors) + 
labs(title = "Scatter Plot of Vessels by Size", 
x = "Vessel Tonnage", 
y = "Vessel Length") + 
  theme_minimal(base_size = 8) + 
  theme(legend.position = "top", 
        axis.line = element_line(color = "black"))

vessel_count <- N_vessel %>%
  group_by(vessel_type) %>%
  summarize(vessel_no = n()) %>% 
  mutate(vessel_type = reorder(vessel_type, - vessel_no))


bar_vessel_type <- ggplot(data = vessel_count, 
                          aes(x = vessel_type, 
                              y = vessel_no, fill = vessel_type)) + 
  geom_bar(stat = "identity") + 
  scale_fill_manual(values = vessel_colors) + 
  geom_text(aes(label = vessel_no), vjust = -0.8, size = 2) +
  labs(title = "Count of Vessels by Type", 
x = "Type of vessels registered", 
y = "No. of vessels") + 
  theme_minimal(base_size = 8) + 
  theme(axis.text.x = element_text(size = 5), 
        axis.line = element_line(color = "black")) 

scatter_ton_len | bar_vessel_type

Code
rm(scatter_ton_len, bar_vessel_type, vessel_count)
Code
# segmenting data set to focus on fishing and cargo vessels 

fishing_v <- N_vessel %>% filter(vessel_type == "Fishing")
cargo_v <- N_vessel %>% filter(vessel_type == "Cargo_Vessel")

# calculating the quantiles for the respective vessel type 

fishing_v_ton_quant <- fishing_v %>%
  summarise(
    q25 = quantile(tonnage, 0.25),
    median = median(tonnage),
    q75 = quantile(tonnage, 0.75)
  )

cargo_v_ton_quant <- cargo_v %>%
  summarise(
    q25 = quantile(tonnage, 0.25),
    median = median(tonnage),
    q75 = quantile(tonnage, 0.75)
  )

# plot for fishing vessel distribution of tonnage
fishing_v_dist <- ggplot(fishing_v, aes(x = tonnage)) +
  geom_histogram(binwidth = 500, fill = "#00CCCC", color = "black") +
  annotate("rect", xmin = 12500, xmax = Inf, ymin = -Inf, ymax = Inf, 
    alpha = 0, color = "#FFBF00") +
  geom_vline(aes(xintercept = fishing_v_ton_quant$q25), 
    color = "blue", linetype = "dashed") +
  geom_vline(aes(xintercept = fishing_v_ton_quant$median),
   color = "red", linetype = "dashed") +
  geom_vline(aes(xintercept = fishing_v_ton_quant$q75), 
    color = "blue", linetype = "dashed") +
  annotate("text", x = fishing_v_ton_quant$q25, y = 30, 
    label = paste("25%", fishing_v_ton_quant$q25) , vjust = -1, size = 3) +
  annotate("text", x = fishing_v_ton_quant$median, y = 30, 
    label = paste("50%",fishing_v_ton_quant$median), vjust = -1, size = 3) +
  annotate("text", x = fishing_v_ton_quant$q75, y = 30, 
    label = paste("75%", fishing_v_ton_quant$q75),  vjust = -1, size = 3) + 
  labs(title = "Distribution of Fishing Vessels by Tonnage",
       x = "Tonnage",
       y = "Frequency") +
  theme_minimal(base_size = 8) + 
  theme(axis.line = element_line(color = "black")) + 
  coord_flip()

# Adjust label to readable format 
# Adjust order so that the text is above the 

cargo_v_dist <- ggplot(cargo_v, aes(x = tonnage)) +
  geom_histogram(binwidth = 500, fill = "#994C00", color = "black") +
 annotate("rect", xmin = -Inf, xmax = 5000, ymin = -Inf, ymax = Inf, alpha = 0, color = "#FFBF00") +
  annotate("rect", xmin = 70000, xmax = Inf, ymin = -Inf, ymax = Inf, alpha = 0, color = "#FFBF00") +
  geom_vline(aes(xintercept = cargo_v_ton_quant$q25), color = "blue", linetype = "dashed") +
  geom_vline(aes(xintercept = cargo_v_ton_quant$median), color = "red", linetype = "dashed") +
  geom_vline(aes(xintercept = cargo_v_ton_quant$q75), color = "blue", linetype = "dashed") +
  annotate("text", x = cargo_v_ton_quant$q25, y = 30, 
    label = paste("25%", cargo_v_ton_quant$q25), vjust = -1, size = 3) +
  annotate("text", x = cargo_v_ton_quant$median, y = 30, 
    label = paste("50%",cargo_v_ton_quant$median), vjust = -1, size = 3) +
  annotate("text", x = cargo_v_ton_quant$q75, y = 30, 
    label = paste("75%", cargo_v_ton_quant$q75), vjust = -1, size = 3) +
  labs(title = "Distribution of Cargo Vessels by Tonnage",
       x = "Tonnage",
       y = "Frequency") +
  theme_minimal(base_size = 8) + 
  theme(axis.line = element_line(color = "black")) + 
  coord_flip()

#summary(fishing_v) - min: 100, q1: 600, median: 2400, q3: 4850, max: 17200 - investigate
#summary(cargo_v) - min: 2100, q1: 2100, median: 23750, q3: 74925, max: 76300 - Investigate 

fishing_v_dist | cargo_v_dist

3.4 Exploring Dwell of Vessels of Interest

3.4.1: Dwell of unusually large fishing vessels which may point to over-fishing

Steps taken:

  1. Filter the list of vessel by “Fishing” vessels and “Tonnage” > 12500 and identify which country and companies these vessel belong to
  2. Plot the median dwell time that these unusually large fishing vessel spend at for the various geographical areas.

Visualisation Improvements:

  1. Factoring the region to order and group these geographical areas by their nature (e.g., Beige for Island / Ports, Light blue for fishing region, light green for ecological preserves, navy for navigation buoys, remaining outlier of Exit East)

Insights:

  1. 2 vessels, namely “marinemarauder8c9” and “pikepirate89a” has Tonnage > 12,500
  2. Lingering presence of “marinemarauder8c9” at ecological preserve of Nemo Reefs.
  3. Distinct paths taken by the separate vessels, and reasonable time spent in authorised fishing regions.
  4. However, “pikepirate89a” has persistently stayed in “Cod Table” prompting further investigation on potential over-fishing after mapping with cargo ids.
Code
# understanding which vessel 
abn_fish_vessel <- fishing_v %>% filter(tonnage > 12500)

sel_abn_fish_vessel <- abn_fish_vessel %>% 
  select(vessel_name, flag_country, vessel_company) 

datatable(sel_abn_fish_vessel, 
          options = list(pageLength = 5)) 
Code
# understanding region that vessel spends time at 

abn_fish_v_activity <- ping_activity %>% 
  filter(vessel_id %in% abn_fish_vessel$vessel_id) 
  
abn_fish_v_activity_sum <- abn_fish_v_activity %>%
  group_by(vessel_id, area) %>%
  summarise(median_dwell = median(dwell, na.rm = TRUE))

# Contrasting with the median time spent by fishing vessels 

fish_v_activity <- ping_activity %>% 
  filter(vessel_id %in% fishing_v$vessel_id) 
  
fish_v_activity_sum <- fish_v_activity %>%
  group_by(vessel_id, area) %>%
  summarise(median_dwell = median(dwell, na.rm = TRUE))

unique(fish_v_activity$area)
 [1] City of Haacklee        City of Lomark          City of Himark         
 [4] City of Paackland       City of South Paackland Nav 3                  
 [7] Nav D                   Nav B                   Nav A                  
[10] Nav C                   Nav 2                   Nav 1                  
[13] Exit East               Nav E                   Cod Table              
[16] Ghoti Preserve          Wrasse Beds             Nemo Reef              
[19] Don Limpet Preserve     Tuna Shelf             
21 Levels: Nav E Nav D Nav C Nav B Nav A Nav 3 Nav 2 Nav 1 ... City of Port Grove
Code
# assigning order to the plot
fish_v_activity$area <- factor(fish_v_activity$area, levels = c("Nav E", "Nav D", "Nav C" , "Nav B", "Nav A" , "Nav 3", "Nav 2", "Nav 1", "Ghoti Preserve", "Don Limpet Preserve", "Nemo Reef", "Cod Table","Tuna Shelf","Wrasse Beds","Exit East", "City of South Paackland", "City of Paackland","City of Lomark","City of Himark","City of Haacklee","City of Port Grove"))

# annotate area labels by color 

fishing_dwell <- ggplot(fish_v_activity, aes(x = area, y = dwell)) +
annotate("rect", ymin = -Inf, ymax = Inf, xmin = 15.5 , xmax = 20.5, alpha = 0.8, fill = "beige") +    # Ports
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 11.5, xmax = 14.5, alpha = 0.5, fill = "lightblue") +  # Fishing ground
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 8.5, xmax = 11.5, fill = "lightgreen", alpha = 0.5) +  # Ecological preserve
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 0.5, xmax = 8.5, alpha = 0.5, fill = "lightgrey")  +
  geom_boxplot() +
  geom_point(data = subset(fish_v_activity, vessel_id == "marinemarauder8c9"),
             aes(x = area, y = dwell), color = "#66FFFF", size = 1) +
  geom_point(data = subset(fish_v_activity, vessel_id == "pikepirate89a"),
             aes(x = area, y = dwell), color = "#009999", size = 1) +
  theme_minimal(base_size = 8) +
  labs(title = "Median Dwell Time by Area", 
       x = "Area", 
       y = "Median Dwell Time") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        axis.line = element_line(color = "black")) + 
  scale_y_continuous(limits = c(0, 600000)) + 
  coord_flip()  

fishing_dwell 

Code
vessels_of_interest <- c("marinemarauder8c9", "pikepirate89a")
vessel_colors <- c("marinemarauder8c9" = "#66FFFF", "pikepirate89a" = "#009999")

# creating route for selected vessel 
vessel_trajectory_selected <- vessel_trajectory %>%
  filter(vessel_id %in% vessels_of_interest)

# defining colors for X.kind

kind_colors <- c(
  "Island" = "beige", 
  "Fishing Ground" = "lightblue", 
  "Ecological Preserve" = "lightgreen", 
  "city" = "purple", 
  "buoy" = "blue")

large_fish_v_route <- ggplot() +
  geom_sf(data = oceanus_geog, aes(fill = X.Kind), color = "black") +
  scale_fill_manual(values = kind_colors) + 
  geom_sf(data = vessel_trajectory_selected, 
          aes(color = vessel_id), 
          size = 3) +
  scale_color_manual(values = vessel_colors) + 
  geom_text(data = OceanusLocations_df, 
            aes(x = XCOORD, y = YCOORD, label = Name), 
            size = 2, hjust = 1, vjust = 1) +
  theme_minimal() +
  labs(title = "Trajectories of South Seafood Express Corp", 
       x = "Longitude", y = "Latitude", color = "Vessel ID")

large_fish_v_route

3.4.2: Dwell of small (<5000 tons) and unusually large Cargo Vessels > 75,000 tons)

Possibility: Illegal unauthorised fishing with small size Cargo Vessels by foreign countries.

Steps taken & visualisation improvement: Similar to 3.4.1

Insights derived:

  1. High presence in “Unrecognised” ping locations.
  2. Substantial number of cargo vessels with comparable size as fishing vessel median tonnage, and dwell time spent at each are relatively varied. Further analysis will be done after matching of vessels to cargo.
Code
abn_cargo_vessel_s <- cargo_v %>% filter(tonnage < 5000)
abn_cargo_vessel_l <- cargo_v %>% filter(tonnage > 70000)

sel_abn_cargo_vessel_s <- abn_cargo_vessel_s %>% 
  select(vessel_name, flag_country, vessel_company) 

sel_abn_cargo_vessel_l <- abn_cargo_vessel_l %>% 
  select(vessel_name, flag_country, vessel_company) 

# display records 
datatable(sel_abn_cargo_vessel_s, 
          options = list(pageLength = 5), 
          filter = "top")
Code
datatable(sel_abn_cargo_vessel_l, 
          options = list(pageLength = 5), 
          filter = "top")
Code
# subset cargo vessel activity
cargo_v_activity <- ping_activity %>% 
  filter(vessel_id %in% cargo_v$vessel_id) 

# extracting list of vessel id for large and small cargo
selected_vessel <- abn_cargo_vessel_s$vessel_id
selected_vessel2 <- abn_cargo_vessel_l$vessel_id

# plotting
cargo_dwell <- ggplot(cargo_v_activity, aes(x = area, y = dwell)) +
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 14.5 , xmax = 19.5, alpha = 0.8, fill = "beige") +    # Ports
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 10.5, xmax = 13.5, alpha = 0.5, fill = "lightblue") +  # Fishing ground
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 7.5, xmax = 10.5, fill = "lightgreen", alpha = 0.5) +  # Ecological preserve
  annotate("rect", ymin = -Inf, ymax = Inf, xmin = 0.5, xmax = 7.5, alpha = 0.5, fill = "lightgrey") +
  geom_boxplot() +
  geom_point(data = subset(cargo_v_activity, vessel_id == selected_vessel),
             aes(x = area, y = dwell), color = "#FF9933", size = 1) +
  geom_point(data = subset(cargo_v_activity, vessel_id == selected_vessel2),
             aes(x = area, y = dwell), color = "#994C00", size = 1) +
  theme_minimal(base_size = 8) +
  labs(title = "Median Dwell Time by Area", 
       x = "Area", 
       y = "Median Dwell Time") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        axis.line = element_line(color = "black")) + 
  scale_y_continuous( limits = c(0, 20000)) + #zoom into 2 * 10^5
  coord_flip()

cargo_dwell