FishEye has learned that SouthSeafood Express Corp has been caught fishing illegally. The scandal caused a major disruption in the close-knit fishing community. FishEye has been collecting data on ship movements and shipping records in hopes that they could assemble a cohesive store of knowledge that will allow them to better understand local commercial fishing behavior.
1.2 Key Objectives of this assignment:
FishEye analysts need help to perform geographic and temporal analysis of the CatchNet data so they can prevent illegal fishing from happening again. This assignment will focus on sub-questions 1 & 4, namely:
Sub-qn
Objectives
Techniques Used
1
Visualization system to associate vessels with their probable cargos, including seasonality analysis to detect any anomalies in port exit records.
Visualising geographical data
Visualisation with Treemap
4
Visualizing changes in fishing activities after SouthSeafood Express was caught.
Analysing time series
Visual multi-variate analysis with parallel coordinates plot
2. Data Import & Processing
2.1 Loading relevant packages and data
Packages Used
Purpose
jsonlite, sf
Importing JSON file and geojson for geographical data
2.2.1 Extracting edges & understanding edge tibble data table.
Extracting JSON file edges to tibble data frame and removing duplicates
Converting to the correct data types for datetime
Renaming columns starting with “_ xx_” to minimize downstream syntax errors.
Creating subsets of data tables based on column of “type”
Renaming of columns to provide context and unique identifiers for downstream data mapping.
Specifically for tx_c, adjustment of transaction date, T0 to T-1 as fish import generally leave harbor one day after delivery
Resultant data sets include:
Resultant Edges Data Set
Unique columns
Transactions (i.e., tx_c)
Cargo_id, Destination, Transaction date, Fish species
Transponder Ping (i.e., E_Tping_c)
Start_time, Dwell, Ping source, Vessel_id
Habor Arrival Report (E_Hbrpt_c)
Vessel_id, Port, Key, Arrival date, Port master, Aphorism, Holiday Greeting, Saying of the Sea, Wisdom
Code
#assigning to mc2_edges2mc2_edges <-as_tibble(mc2_data$links) %>%distinct() #correcting data type - converting to date formatmc2_edges$time <-as_datetime(mc2_edges$time)mc2_edges$"_last_edited_date"<-as_datetime(mc2_edges$"_last_edited_date")mc2_edges$"_date_added"<-as_datetime(mc2_edges$"_date_added")mc2_edges$date <-as_datetime(mc2_edges$date)#renaming headers with "_" to prevent errors. mc2_edges <- mc2_edges %>%rename("last_edited_by"="_last_edited_by","date_added"="_date_added","last_edited_date"="_last_edited_date","raw_source"="_raw_source","algorithm"="_algorithm") #checking data format#glimpse(mc2_edges)# Breaking into subsets based on event categoryE_TransponderPing <-subset(mc2_edges, mc2_edges$type =="Event.TransportEvent.TransponderPing")E_HarborRpt <-subset(mc2_edges, mc2_edges$type =="Event.HarborReport")E_Tx <-subset(mc2_edges, mc2_edges$type =="Event.Transaction")# Dropping columns that are NULL and renaming variables to separate them# TransactionsE_Tx_c <- E_Tx %>%rename(cargo_id = source, dest = target,tx_date = date) %>%mutate(tx_date = tx_date -1) %>%# adjustment for recordsselect(-c(key, algorithm, `raw_source`, `type`, `data_author`, `aphorism`, `holiday_greeting`, `wisdom`, `saying of the sea`, `time`, `dwell`)) # Separating the fish species for the respective cargo tx_sub1 <- E_Tx_c[grep("^City of", E_Tx_c$dest), ]tx_sub2 <- E_Tx_c[!grepl("^City of", E_Tx_c$dest), ]tx_sub2 <- tx_sub2 %>%rename(fish_species = dest)tx_c <-left_join(tx_sub1, tx_sub2 %>%select(cargo_id, fish_species), by ="cargo_id")# Dropping raw source: All Oceanus Centralized Export/Import Archive and Notatification Service (OCEANS)# Dropping algorithm: CatchMate ('arrrr' edition)# Null columns - Data author, aphorism, holiday_greeting, wisdom, saying of the sea, time, dwell# Transponder PingE_Tping_c <- E_TransponderPing %>%rename(vessel_id = target, ping_source = source,start_time = time) %>%select(-c(key, `algorithm`, `raw_source`, `type`, `date`, `data_author`, `aphorism`, `holiday_greeting`, `wisdom`, `saying of the sea`)) #Dropping raw_source: All Oceanus Vessel Locator System#Dropping algorithm: All OVLS-Catch&Hook# Null columns - Date, Data author, aphorism, holiday_greeting, wisdom, saying of the sea# Habour ReportE_Hbrpt_c <- E_HarborRpt %>%rename(vessel_id = source, port = target, arr_date = date, port_master = data_author, saying =`saying of the sea`) %>%select(-c(`algorithm`, `type`, `time`, `dwell`)) #Dropping algorithm: All HarborReportMaster 3.11#Retain raw_source: Differing values depending on which Port / Cityrm(tx_sub1, tx_sub2, E_TransponderPing, E_HarborRpt, E_Tx, mc2_edges, E_Tx_c)
2.2.2 Extracting nodes & understanding nodes tibble data table.
Repeating similar steps for nodes records, we obtained the following resultant data set. To minimize the number of data sets, we appended nodes information on vessels and region into a combined data set, with an assigned label to identify it’s category. (e.g., vessel_type)
Resultant Nodes Data Sets
Unique Columns
N_fish (Fishes and their description)
Fish_id, fish species
N_Delivery_Doc (Cargo and details)
Cargo_id, quantity in tons, delivery date
N_vessel (Vessels and their description)
Vessel_id, vessel name, vessel company, flag country, tonnage, overall length, vessel type
Location_legend (Point, City, Region)
Area, activities, kind, fish_species
Code
#segmenting nodes data and checking for distinct recordsmc2_nodes <-as_tibble(mc2_data$nodes) %>%distinct()#renaming to remove the "_" mc2_nodes <- mc2_nodes %>%rename("last_edited_by"="_last_edited_by","date_added"="_date_added","last_edited_date"="_last_edited_date","raw_source"="_raw_source","algorithm"="_algorithm") #tidying the text data to remove nested list.mc2_nodes_tidied <- mc2_nodes %>%mutate(Activities =gsub("c[(]", "", Activities)) %>%mutate(Activities =gsub("\"", "", Activities)) %>%mutate(Activities =gsub("[)]", "", Activities)) mc2_nodes_tidied <- mc2_nodes_tidied %>%mutate(fish_species_present =gsub("c[(]", "", fish_species_present)) %>%mutate(fish_species_present =gsub("\"", "", fish_species_present)) %>%mutate(fish_species_present =gsub("[)]", "", fish_species_present)) # Creating subset on nodes informationN_fish <-subset(mc2_nodes_tidied, mc2_nodes_tidied$type =="Entity.Commodity.Fish") %>%select_if(~!any(is.na(.))) %>%select(-c(`type`, `raw_source`, `algorithm`, `Activities`, `fish_species_present`)) %>%rename(fish_species = name, fish_id = id)NL_City <-subset(mc2_nodes_tidied, mc2_nodes_tidied$type =="Entity.Location.City") %>%select_if(~!any(is.na(.))) %>%select(-c(`raw_source`, `algorithm`, `type`, `fish_species_present`)) %>%rename(city_name = Name, city_id = id)NL_Point <-subset(mc2_nodes_tidied, mc2_nodes_tidied$type =="Entity.Location.Point") %>%select_if(~!any(is.na(.))) %>%select(-c(`raw_source`, `algorithm`, `kind`, `fish_species_present`)) %>%rename(point_name = Name, point_id = id)## Need to tidy NL RegionNL_Region <-subset(mc2_nodes_tidied, mc2_nodes_tidied$type =="Entity.Location.Region") %>%select_if(~!any(is.na(.))) %>%select(-c(`raw_source`, `algorithm`, `type`, `Description`)) %>%rename(region_name = Name, region_id = id, region_kind = kind)N_Delivery_doc <-subset(mc2_nodes_tidied, mc2_nodes_tidied$type =="Entity.Document.DeliveryReport") %>%select_if(~!any(is.na(.))) %>%rename(deliver_date = date,cargo_id = id) %>%select(-c(`algorithm`, `type`, `raw_source`, `Activities`, `fish_species_present`)) ## consider adding back more columns, it dropped some columns where values were partial NAN_vessel <- mc2_nodes_tidied %>%filter(grepl("Entity.Vessel", type)) %>%mutate(vessel_type =case_when(grepl("FishingVessel", type, ignore.case =TRUE) ~"Fishing",grepl("Ferry.Passenger", type, ignore.case =TRUE) ~"Ferry_Passenger",grepl("Ferry.Cargo", type, ignore.case =TRUE) ~"Ferry_Cargo",grepl("Research", type, ignore.case =TRUE) ~"Research", grepl("Other", type, ignore.case =TRUE) ~"Other", grepl("Tour", type, ignore.case =TRUE) ~"Tour", grepl("CargoVessel", type, ignore.case =TRUE) ~"Cargo_Vessel" )) %>%mutate(company =ifelse(is.na(company), "Unknown", company)) %>%# Handle NA values by replacing NA with unknownselect(-c(`algorithm`, `type`, `raw_source`, `Activities`, `fish_species_present`, `Description`, `kind`, `style`, `name`, `qty_tons`,`date`)) %>%rename(vessel_id = id, vessel_name = Name, vessel_company = company)# Further exploring records where there is null valuespartial_na_records <- N_vessel[!complete.cases(N_vessel), ] %>%select(-c(last_edited_by, date_added, last_edited_date)) partial_na_sum <- partial_na_records %>%group_by(vessel_type) %>%summarize(count =n()) # Plot NA recordspartial_na_plot <-ggplot(partial_na_sum, aes(x = vessel_type, y = count)) +geom_bar(stat ="identity", fill ="grey") +geom_text(aes(label = count), vjust =-0.2, size =4) +labs(title ="Breakdown of vessel type for partial NA records", x ="Vessel Type",y ="Count of vessel with NA") +theme_minimal(base_size =12) +theme(axis.line =element_line(color ="black"))partial_na_plot
# merging ping with location details (region_id, city_id, point_id)## See how to deal with list within activites, and how to include region_kind## See if want to retain some description to identify point, region vs citycity_legend <- NL_City %>%select(c(`city_id`, `Activities`, `kind`)) %>%mutate(fish_species_present ="NA") %>%rename(area = city_id)point_legend <- NL_Point %>%select(c(`point_id`, `Activities`)) %>%mutate(kind ="point", fish_species_present ="NA") %>%rename(area = point_id)region_legend <- NL_Region %>%select(c(`region_id`, `Activities`, `region_kind`, `fish_species_present`)) %>%rename(area = region_id, kind = region_kind)location_legend <-rbind(city_legend, point_legend, region_legend) write_csv(N_vessel, "data/N_vessel.csv")write_csv(location_legend, "data/location_legend.csv")#dropping unnecessary tablesrm(mc2_data, mc2_nodes_tidied, partial_na_records, city_legend, point_legend, region_legend, partial_na_sum, partial_na_plot, mc2_nodes)
Insights:
Total of 18 vessels with partial NA
All vessel company is “Unknown”, belonging to Oceanus (flag_country = Oceanus)
All 18 vessels fall under non fishing and non cargo_vessel type, hence, they will be excluded from our analysis.
2.2.5 Merging back the data after processing.
To incorporate context of the nodes details into the various edges, the related description of the nodes were appended to edge data sets. This helped to streamline the records into 3 consolidated data sets.
Consolidated Data Set
Resultant Edge Data Set
Mapped with Nodes Data Set
Transaction (with cargo weight)
tx_c: Cargo_id, Destination, Transaction date, Fish species
N_delivery_doc: Cargo_id, quantity in tons, delivery date
Ping activity (with vessel details by vessel_id, possible fish caught by location_legend)
Import ESRI layer including the centriod details of geographical areas.
Extract coordinates from geographical data frame and appending to vessel movement data frame to plot the movement of vessels.
Filter the vessels of interest to the vessels belonging to “South Seafood Express Corp” and plot the routes taken by the 2 vessels, namely ““Snappersnatcher7be”, “Roachrobberdb6”.
Visualisation improvement:
Color coding the regions for better identification of the type of region. (e.g., Beige for island, blue for fishing region and green for ecological reserves).
Insights from South Seafood Express Corp Vessels:
For “Snappersnatcher7be”, common cities visited are City of Lomark, City of Packland, with legal fishing area of Wrasse Beds and Cod Table and possible illegal fishing activity at Ghoti Reserve.
For “Roachrobberdb6”, common cities visited are City of Himark and City of Lomark, with legal fishing area of Wrasse Beds, and potential illegal fishing activity at Ghoti Reserve.
Reading layer `Oceanus Geography' from data source
`C:\sengjingyi\ISSS608\Take-home_Ex\Take-home_Ex03\data\shp'
using driver `ESRI Shapefile'
Simple feature collection with 27 features and 7 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -167.0654 ymin: 38.07452 xmax: -163.2723 ymax: 40.67775
Geodetic CRS: WGS 84
Code
write_rds(OceanusLocations, "data/rds/OceanusLocations.rds")# extract coordinates from dfcoords <-st_coordinates(OceanusLocations)# drop geometry columnsOceanusLocations_df <- OceanusLocations %>%st_drop_geometry()# append x and y coordinates into df as columnsOceanusLocations_df$XCOORD <- coords[, "X"]OceanusLocations_df$YCOORD <- coords[, "Y"]# tidy df by renaming column OceanusLocations_df <- OceanusLocations_df %>%select(Name, X.Kind, XCOORD, YCOORD) %>%rename(Loc_Type = X.Kind)# left join to append back to vessel movement vessel_movement <- vessel_movement %>%left_join(OceanusLocations_df,by =c("area"="Name"))# save file as vessel_movement_data.data.framewrite_rds(vessel_movement, "data/rds/vessel_movement_data.rds")# convert vessel movement data.frame into sf point data.frame vessel_movement_sf <- vessel_movement %>%st_as_sf(coords =c("XCOORD", "YCOORD"), crs =4326)# arrange record based on vessel name and navigation time vessel_movement_sf <- vessel_movement_sf %>%arrange(vessel_id, start_time)# convert vessel movement sf from point into linestring features known as vessel trajectoryvessel_trajectory <- vessel_movement_sf %>%group_by(vessel_id) %>%summarize(do_union =FALSE) %>%st_cast("LINESTRING")## include placeholder for vessel of interest and colors assignedvessels_of_interest <-c("snappersnatcher7be", "roachrobberdb6")vessel_colors <-c("snappersnatcher7be"="red", "roachrobberdb6"="purple")# creating route for selected vessel vessel_trajectory_selected <- vessel_trajectory %>%filter(vessel_id %in% vessels_of_interest)# defining colors for X.kindkind_colors <-c("Island"="beige", "Fishing Ground"="lightblue", "Ecological Preserve"="lightgreen", "city"="purple", "buoy"="blue")ggplot() +geom_sf(data = oceanus_geog, aes(fill = X.Kind), color ="black") +scale_fill_manual(values = kind_colors) +geom_sf(data = vessel_trajectory_selected, aes(color = vessel_id), size =1) +scale_color_manual(values = vessel_colors) +geom_text(data = OceanusLocations_df, aes(x = XCOORD, y = YCOORD, label = Name), size =2, hjust =1, vjust =1) +theme_minimal() +labs(title ="Trajectories of South Seafood Express Corp", x ="Longitude", y ="Latitude", color ="Vessel ID")
3.1 Understanding possible fish species and signs of illegal fishing
Steps taken:
Unlist the fish species appended in the initial node of region and spread it across the table to form a matrix to identify the possible fish species identified in each region.
Contrast the fish species from region identified with the fish species caught based on the cargo transactions to detect any deviation that requires further investigation.
Plot a bar graph to visualisation the
Visualisation improvement:
Color coding the fish species representation for easier identification of illegal fish species in red and commonly caught fish species in blue, with additional fish species of “Salmon” in yellow for further investigation.
Factoring the fish species in order to reorder the layers of the stacked bar graph such that illegal fish species are clustered together at the base, followed by regular fish species and lastly the unexpected fish species at the top.
Factoring the destination city such that the graph is ordered in descending order based on transaction quantity in tons.
Including variable time frame (with earliest_date and latest_date) for selection of period of interest when looking at the summary of cargo of interest.
Insights
3 fish species are only present in ecological reserves: (1) Offidiaa/Piscis osseus, (2) Sockfish/Pisces foetida and (3) Helenaa/Pisces satis.
Hence, any cargo with these fish species have likely violated fishing regulations and fished in ecological reserves. (1) Don Limpet Preserve, (2) Ghoti Preserve and (3) Nemo Reefs.
Additional species of “oncorhynchusrosea790” identified with the cargo transactions. Referencing the internet fish species, this refers to the commonly known species of “Salmon”.
Code
# Formatting region data to identify fish type in regionregion_species <- NL_Region %>%mutate(fish_species_present =gsub('c\\(|\\)|"', "", fish_species_present), fish_species_present =strsplit(as.character(fish_species_present), ", ")) %>%unnest(fish_species_present) %>%mutate(presence =1) %>%spread(key = fish_species_present, value = presence, fill =0)region_species_c <- region_species %>%select( -region_id, -last_edited_by, -last_edited_date, -date_added)kable(region_species_c)
region_name
Activities
region_kind
Beauvoir/Habeas pisces
Birdseye/Pisces frigus
Cod/Gadus n.specificatae
Harland/Piscis sapidum
Helenaa/Pisces satis
Offidiaa/Piscis osseus
Sockfish/Pisces foetida
Tuna/Thunnini n.vera
Wrasse/Labridae n.refert
Don Limpet Preserve
Recreation, Tourism
Ecological Preserve
1
1
0
0
1
0
1
1
0
Cod Table
Commercial fishing
Fishing Ground
1
1
1
0
0
0
0
0
0
Tuna Shelf
Commercial fishing, Sport fishing
Fishing Ground
1
1
0
1
0
0
0
1
0
Ghoti Preserve
Research, Tourism, Recreation
Ecological Preserve
1
0
0
0
1
1
0
0
1
Nemo Reef
Recreation, Tourism
Ecological Preserve
1
1
0
0
1
0
0
1
1
Wrasse Beds
Commercial fishing
Fishing Ground
1
1
0
0
0
0
0
0
1
Code
# comparing the list of unique fish species in the tx_qtyunique_fish_cargo <-unique(tx_qty$fish_species)# unique_fish_cargo has additional species of salmon - oncorhynchusrosea790#Aligning the naming convention for fish speciesfish_species_labels <-c("gadusnspecificatae4ba"="Cod/Gadus n.specificatae", "piscesfrigus900"="Birdseye/Pisces frigus", "piscesfoetidaae7"="Sockfish/Pisces foetida", # illegal"labridaenrefert9be"="Wrasse/Labridae n.refert", "habeaspisces4eb"="Beauvoir/Habeas pisces", "piscissapidum9b7"="Harland/Piscis sapidum", "thunnininveradb7"="Tuna/Thunnini n.vera", "piscisosseusb6d"="Offidiaa/Piscis osseus", # illegal"piscessatisb87"="Helenaa/Pisces satis"# illegal)## assign specific colors to fish species, red for illegal. fish_species_color <-c("piscesfoetidaae7"="#FF6666", "piscisosseusb6d"="#FF9999", "piscessatisb87"="#FFCCCC", "gadusnspecificatae4ba"="#CCE5FF", "piscissapidum9b7"="#99CCFF", "habeaspisces4eb"="#66B2ff", "piscesfrigus900"="#CCE5FF", "oncorhynchusrosea790"="#FFFF99", "labridaenrefert9be"="#99CCFF", "thunnininveradb7"="#66b2ff" )# include paramters for users to change for timeframetx_qty$tx_date <-as.Date(tx_qty$tx_date, format ="%Y-%m-%d")earliest_date <-min(tx_qty$tx_date, na.rm =TRUE)latest_date <-max(tx_qty$tx_date, na.rm =TRUE)## filtering the data set of interest tx_qty_of_interest <- tx_qty %>%filter(tx_date >= earliest_date & tx_date <= latest_date)# summarise total tons of fish per locationtotal_qty_tons_per_dest <- tx_qty_of_interest %>%group_by(dest, fish_species) %>%summarize(total_qty_tons =sum(qty_tons, na.rm =TRUE)) %>%ungroup()# reordering levels for fish species for tidier plottotal_qty_tons_per_dest$fish_species <-factor( total_qty_tons_per_dest$fish_species, levels =c("gadusnspecificatae4ba", "piscissapidum9b7", "habeaspisces4eb", "piscesfrigus900", "labridaenrefert9be", "thunnininveradb7", "oncorhynchusrosea790", # unidentified - Salmon"piscesfoetidaae7","piscisosseusb6d", "piscessatisb87" )) #illegal# reordering levels to arrange bars in descending order total_qty_tons_per_dest$dest <-factor( total_qty_tons_per_dest$dest, levels =c("City of Paackland", "City of Himark", "City of Lomark", "City of South Paackland", "City of Haacklee" )) # plot identifying occurence of illegal fish species at various port - identifying all unique fish species in cargo report (tx_qty)p_qty_dest <-ggplot(total_qty_tons_per_dest, aes(x = dest, y = total_qty_tons, fill = fish_species)) +geom_bar(stat ="identity") +scale_fill_manual(values = fish_species_color, labels = fish_species_labels) +labs(title ="Total Quantity of Fish by Destination and Species",x ="Destination",y ="Total Quantity (tons)",fill ="Fish Species") +theme_minimal(base_size =8) +theme(axis.line =element_line(color ="black"))p_qty_dest
Work-in-progress: Attempt to plot the type of fish species location of origin and the port received based on cargo transaction. Currently missing overlap of geographical point and polygons.
Code
# Network analysis to see mapping of fish species delivered to which port at what quantityedges <- total_qty_tons_per_dest %>%select(from = fish_species, to = dest, weight = total_qty_tons, dest = dest)# Create graph from data framegraph <-graph_from_data_frame(edges, directed =FALSE)# Convert to tbl_graph objectgraph_tbl <-as_tbl_graph(graph)# Plot the network graph using ggraphfish_dest_map <-ggraph(graph_tbl, layout ='nicely') +geom_edge_link(aes(width = weight), alpha =0.8) +geom_node_point(aes(color =ifelse(name %in% edges$from, "Fish Species", "Destination")), size =2) +geom_node_text(aes(label = name), vjust =1.5, hjust =1.5, size =2)+scale_edge_width(range =c(0.5, 5)) +scale_color_manual(values =c("Fish Species"="lightblue", "Destination"="beige")) +theme_void(base_size =8) +labs(title ="Network Analysis of Fish Species per Destination", color ="Node Type", edge_width ="Total Quantity (tons)") +facet_edges(~dest) +th_foreground(foreground ="grey",border =TRUE) +theme(legend.position ="bottom")fish_dest_map
Code
rm(edges, graph, graph_tbl, fish_dest_map)
Steps taken:
Plot combined box plot, stats_dot to highlight the quantity of cargo receive per fish species per destination with facet_grid( ~dest).
Visualisation improvement
Highlight abnormal cargo records where quantity <0 and blue line to show reference of 0 tonnes.
Implement facet_grid to show the reference of fish species received across various ports.
Insights:
Cargo with quantity <0 are scattered across the city and fish species.
For certain ports (e.g., City of Haacklee and City of Himark), there are absence of cargo of certain fish species. For “thunniniveradb7”, the first is only offloaded at City of Haacklee and City of South Paackland.
Code
ggplot(tx_qty, aes(x = fish_species, y = qty_tons)) +geom_boxplot(width = .15, ## remove outliersoutlier.color =NA## `outlier.shape = NA` or `outlier.alpha = 0` works as well ) +## add dot plots from {ggdist} package ggdist::stat_dots(## orientation to the rightside ="right", justification =-0.2,## adjust grouping (binning) of observations binwidth =1, dotsize =0.1 ) +## add highlighted dots where qty_tons <= 0 ggdist::stat_dots(data =subset(tx_qty, qty_tons <=0),side ="right",justification =-0.2,binwidth =1,dotsize =0.1,color ="red" ) +## add horizontal line at qty_tons = 0geom_hline(yintercept =0, linetype ="dashed", color ="blue") +coord_flip() +theme_minimal() +labs(title ="Quantity of Fish Species by Weight (Tons)",x ="Fish Species",y ="Quantity (Tons)") +theme(axis.text.x =element_text(hjust =1, size =7),axis.text.y =element_text(size =7),plot.title =element_text(size =12, face ="bold"),strip.text =element_text(size =7),plot.margin =unit(c(1, 1, 1, 1), "mm"), panel.spacing =unit(0.5,"lines")) +facet_grid(. ~ dest)
3.2 Understanding Ownership of Vessels
Steps taken:
Group vessels by company and type to count the number of vessel type per company.
Arrange the calculated table by descending order of vessels.
3.2.1: Ownership of “Fishing Vessels” largely belonging to “Oceanus”
Steps taken:
Focusing on the key categories of “Fishing”, plot a bar chart on the count of company that owns the “x” no. of vessels.
Hence we will visualise the mapping of company to vessels for company that owns 4 or more vessels. (“No of vessels of interest” is set to adjustable threshold)
Visualisation Improvements:
Summed the count of known companies with “Fishing Vessels”
Highlighted the names of companies with 4 or more fishing vessels, likely to be larger companies, based on adjustable parameter of no_of_interest.
Mapped the name of the fishing vessels associated with these companies of interest.
Insights:
For Cargo_vessels, 99 counts of vessels with “Unknown” company, only 1 “Cargo vessel” with known company of “Saltwater Sisters Company Marine”.
Only Saltwater Sisters Company Marine has 2 types of vessel, vessels of other vessel_types (Tour, Research, Other) belong to “Unknown” companies.
All other known companies own fishing vessels, where 45 companies own 1 vessel, 36 companies own 2, 13 companies own 3 and 5 companies own 4 or more.
Code
vessel_per_type_company <- N_vessel %>%group_by(vessel_company, vessel_type) %>%summarise(vessel_type_count =n()) %>%arrange(vessel_type_count)vessel_type_company <- vessel_per_type_company %>%group_by(vessel_company) %>%summarise(vessel_count =n()) %>%arrange(vessel_count)# expose datatable on the count of vessel type per company. datatable(vessel_per_type_company, options =list(pageLength =5), filter ="top")
# Hence for plot, we will focus on vessel_type = "fishing" and "cargo_vessel"fish_vessel <- vessel_per_type_company %>%filter(vessel_type =="Fishing") %>% as.data.tablefish_vessel_sum <- fish_vessel[, .(company_count = .N, company_names =toString(vessel_type_count)), by = vessel_type_count]#enforcing all x axis values for clearer depiction by introducing breaksfish_v_count_range <-seq(min(fish_vessel_sum$vessel_type_count), max(fish_vessel_sum$vessel_type_count))# introduce wrap text function to limit company name within the columnwrap_text <-function(text, width =15) {sapply(text, function(x) {paste(strwrap(x, width = width), collapse ="\n") })}# applying to columnfish_vessel_sum$wrapped_company_names <-wrap_text(fish_vessel_sum$company_names)# creating plotcompany_vessel_count <-ggplot(fish_vessel_sum, aes(x = vessel_type_count, y = company_count)) +geom_bar(stat ="identity", fill ="#00CCCC", color ="black") +geom_text(aes(label = company_count), vjust =-0.5, size =3) +scale_x_continuous(breaks = fish_v_count_range) +labs(title ="No. of fishing vessels Owned by companies",x ="Number of Vessels Owned",y ="Count of Companies") +theme_minimal(base_size =8) +theme(axis.line =element_line(color ="black"))+# including annotation ()annotate("rect", xmin =3.5, xmax =6.5, ymin =0, ymax =Inf, alpha =0, color ="#FFBF00", fill =NA) +annotate("text", x =6, y =5, label ="Ritter Ltd", size =2) +annotate("text", x =4, y =20, label ="McCormick Group,\nNielsen,\nValentine and Bell,\nRobinson,\nMcMillan and Watson,\nSolis-Lopez", size =2)
Code
# creating subsetno_of_interest =4company_of_interest <- fish_vessel %>%filter(vessel_type_count >= no_of_interest)int_fish_v_mapping <- N_vessel %>%filter(vessel_company %in% company_of_interest$vessel_company) %>%select(vessel_id, vessel_company, tonnage)# data wrangling to fit into network graphedges <- int_fish_v_mapping %>%select(vessel_id, vessel_company)# Create nodes for vesselsnodes <- int_fish_v_mapping %>%select(vessel_id, tonnage) %>%distinct() %>%rename(name = vessel_id) %>%mutate(type ="vessel")# Create nodes for companiescompany_nodes <-data.frame(name =unique(int_fish_v_mapping$vessel_company)) %>%mutate(type ="company")# Combine nodesall_nodes <-bind_rows(nodes, company_nodes)# Create the graph object using igraphnetwork <-graph_from_data_frame(d = edges, vertices = all_nodes, directed =FALSE)# Add tonnage as a vertex attribute, ensuring NA values are handledV(network)$tonnage <-ifelse(is.na(V(network)$tonnage), 0, V(network)$tonnage)# Add node type as a vertex attributeV(network)$type <- all_nodes$type# Plot the network graph using ggraphmap_vessel_company <-ggraph(network, layout ='fr') +geom_edge_link(aes(edge_alpha =0.5), show.legend =FALSE) +geom_node_point(aes(size = tonnage, color = type), show.legend =TRUE) +geom_node_text(aes(label = name), repel =TRUE, size =2) +scale_color_manual(values =c("vessel"="#00CCCC", "company"="black")) +theme_void(base_size =8) +labs(title ="Network Graph of Vessels and Companies",size ="Tonnage",color ="Node Type") +theme(legend.position ="bottom")company_vessel_count | map_vessel_company
3.2.2: Ownership of Cargo Vessels which are largely unknown
Steps taken:
Filter vessels that are unregistered, where vessel_company = “Unknown”, and include detail of country based on their flags
Plot the count of unknown vessels by flag country and generate data table to list these countries
Calculate their median dwell time at each geographical area and highlight the records of vessels with unknown company in “hot pink”.
Return the list of suspicious vessels that have over-stayed (median dwell > 75% per area) and the regions which they over-stayed.
Visualisation Improvements:
Common color scheme included to identify type of vessels, e.g., brown for cargo vessels, and different shades of grey for other types of vessels.
Reordering the bar chart by no. of vessel counts to make it more readable.
Reordering the geographical areas by regions and assigning the common color scheme as background
Limit the y-axis on median dwell time as there are no records from “Unknown” companies beyond this. This allows us to scale appropriately and have clarity on the “Unknown” vessels’ dwell times.
Insights:
Of the vessels from unknown companies, there are no fishing vessels identified. This may be reasonable as only registered vessels with Oceanus should be authorised to fish within Oceanus’ waters.
27 foreign countries have 2 or more vessels registered as Cargo Vessels within Oceanus vessel records.
While we expect foreign countries cargo vessels to have short dwell time, considering that they are passing by Oceanus waters during their journey to other locations. There are notably long dwell time from certain vessels which are likely to be suspicious as they are over-staying.
Code
# exploring ships that have unknown companiesunknown_v <- N_vessel %>%filter(vessel_company =="Unknown")total_vessel_count <- unknown_v %>%group_by(flag_country) %>%summarize(total_vessel_count =n()) %>%arrange(desc(total_vessel_count))# reorder flag_country to ensure "Oceanus" is first, followed by descending order of vessel countordered_countries <-c("Oceanus", setdiff(total_vessel_count$flag_country, "Oceanus"))# Summarizing to see where unknown vessels come fromunknown_v_sum <- unknown_v %>%group_by(vessel_type, flag_country) %>%summarize(vessel_count =n()) %>%mutate(# Reorder flag_country with Oceanus first and the rest in descending order of vessel countflag_country =factor(flag_country, levels = ordered_countries),# Reorder vessel_type to ensure "Cargo_Vessel" is at the bottomvessel_type =factor(vessel_type, levels =c("Cargo_Vessel", setdiff(unique(vessel_type), "Cargo_Vessel"))))# reordering variable # defining vessel colorsvessel_colors <-c("Cargo_Vessel"="#994C00", "Ferry_Cargo"="#C0C0C0", "Ferry_Passenger"="#E0E0E0","Research"="#A0A0A0","Tour"="#606060", "Other"="#000000")unknown_dist <-ggplot(unknown_v_sum, aes(x = flag_country, y = vessel_count, fill = vessel_type)) +geom_bar(stat ="identity") +scale_fill_manual(values = vessel_colors) +theme_minimal() +labs(title ="Vessel Count by Flag Country and Vessel Type",x ="Flag Country",y ="Vessel Count",fill ="Vessel Type") +theme(axis.text.x =element_text(angle =45, hjust =1, size =5), axis.line =element_line(color ="black")) unknown_dist
# Identifying regions that unknown vessels are at unknown_v_list <- unknown_v$vessel_id# factoring to order the location by defined orderping_activity$area <-factor(ping_activity$area, levels =c("Nav E", "Nav D", "Nav C" , "Nav B", "Nav A" , "Nav 3", "Nav 2", "Nav 1", "Ghoti Preserve", "Don Limpet Preserve", "Nemo Reef", "Cod Table","Tuna Shelf","Wrasse Beds","Exit East", "City of South Paackland", "City of Paackland","City of Lomark","City of Himark","City of Haacklee","City of Port Grove"))vessel_dwell <-ggplot(ping_activity, aes(x = area, y = dwell)) +annotate("rect", ymin =-Inf, ymax =Inf, xmin =15.5 , xmax =20.5, alpha =0.8, fill ="beige") +# Portsannotate("rect", ymin =-Inf, ymax =Inf, xmin =11.5, xmax =14.5, alpha =0.5, fill ="lightblue") +# Fishing groundannotate("rect", ymin =-Inf, ymax =Inf, xmin =8.5, xmax =11.5, fill ="lightgreen", alpha =0.5) +# Ecological preserveannotate("rect", ymin =-Inf, ymax =Inf, xmin =0.5, xmax =8.5, alpha =0.5, fill ="lightgrey") +geom_boxplot() +geom_point(data =subset(ping_activity, vessel_id == unknown_v_list),aes(x = area, y = dwell), color ="hotpink", size =1) +theme_minimal() +labs(title ="Median Dwell Time by Area", x ="Area", y ="Median Dwell Time") +theme(axis.text.x =element_text(angle =45, hjust =1),axis.line =element_line(color ="black")) +scale_y_continuous( limits =c(0, 500000)) +#zoom into 2 * 10^5coord_flip()vessel_dwell
Code
# extracting records where vessel_company is unknown and median dwell > 75% of each area dwell_stats <- ping_activity %>%group_by(area) %>%summarize(median_dwell =median(dwell, na.rm =TRUE),q75_dwell =quantile(dwell, 0.75, na.rm =TRUE) ) # Join the dwell stats back to the unknown vessels datasetunknown_vessels_with_stats <- ping_activity %>%filter(vessel_id %in% unknown_v_list) %>%left_join(dwell_stats, by ="area")# Filter records where dwell time exceeds the 75th percentile, focusing on cargo vessels and country not Oceanusexceeding_75th_percentile <- unknown_vessels_with_stats %>%filter(dwell > q75_dwell, flag_country !="Oceanus", vessel_type =="Cargo_Vessel") %>%distinct(vessel_id, area, flag_country) %>%arrange(area)datatable(exceeding_75th_percentile, options =list(pageLength =5), filter ="top")
Plot scatter graph on the Tonnage vs Overall Length of vessels to see any correlation between vessel capacity and vessel size.
Given that most of the large vessels are “Cargo Vessels” and smaller vessels are “Fishing” vessels, we summarised the count of each vessel type in a bar chart.
Plot histogram to count the number of “Fishing” and “Cargo Vessels” by Tonnage and identified 3 clusters of interest.
Next, we explored the median dwell time per area for each of these 3 subset in relation to their counterparts of the same vessel type.
Insights:
Wide variation of vessel size, and this may be linked with Transhipment vessels.
178 fishing vessels, and 100 cargo vessels, and the remaining 18 vessels of “Research, Tour, Ferry Passenger, Ferry Cargo and Others”.
Based on the distribution of vessel tonnage for “Fishing” and “Cargo Vessels”, we identified 3 subsets of interest.
Subset 1: Unusually large fishing vessels (Tonnage > 12,500)
Subset 2: Substantial group of small cargo vessels (Tonnage < 5000) - Comparable to fishing vessel median tonnage of 4850
Subset 3: Unusually large cargo vessels (Tonnage > 70,000)
Code
# Defining vessel colors vessel_colors <-c("Fishing"="#00CCCC", "Cargo_Vessel"="#994C00", "Ferry_Cargo"="#C0C0C0", "Ferry_Passenger"="#C0C0C0","Research"="#C0C0C0","Tour"="#C0C0C0", "Other"="#C0C0C0")#unique(N_vessel$vessel_type)scatter_ton_len <-ggplot(data= N_vessel, aes(x= tonnage, y= length_overall, color= vessel_type)) +geom_point() +scale_color_manual(values = vessel_colors) +labs(title ="Scatter Plot of Vessels by Size", x ="Vessel Tonnage", y ="Vessel Length") +theme_minimal(base_size =8) +theme(legend.position ="top", axis.line =element_line(color ="black"))vessel_count <- N_vessel %>%group_by(vessel_type) %>%summarize(vessel_no =n()) %>%mutate(vessel_type =reorder(vessel_type, - vessel_no))bar_vessel_type <-ggplot(data = vessel_count, aes(x = vessel_type, y = vessel_no, fill = vessel_type)) +geom_bar(stat ="identity") +scale_fill_manual(values = vessel_colors) +geom_text(aes(label = vessel_no), vjust =-0.8, size =2) +labs(title ="Count of Vessels by Type", x ="Type of vessels registered", y ="No. of vessels") +theme_minimal(base_size =8) +theme(axis.text.x =element_text(size =5), axis.line =element_line(color ="black")) scatter_ton_len | bar_vessel_type
# segmenting data set to focus on fishing and cargo vessels fishing_v <- N_vessel %>%filter(vessel_type =="Fishing")cargo_v <- N_vessel %>%filter(vessel_type =="Cargo_Vessel")# calculating the quantiles for the respective vessel type fishing_v_ton_quant <- fishing_v %>%summarise(q25 =quantile(tonnage, 0.25),median =median(tonnage),q75 =quantile(tonnage, 0.75) )cargo_v_ton_quant <- cargo_v %>%summarise(q25 =quantile(tonnage, 0.25),median =median(tonnage),q75 =quantile(tonnage, 0.75) )# plot for fishing vessel distribution of tonnagefishing_v_dist <-ggplot(fishing_v, aes(x = tonnage)) +geom_histogram(binwidth =500, fill ="#00CCCC", color ="black") +annotate("rect", xmin =12500, xmax =Inf, ymin =-Inf, ymax =Inf, alpha =0, color ="#FFBF00") +geom_vline(aes(xintercept = fishing_v_ton_quant$q25), color ="blue", linetype ="dashed") +geom_vline(aes(xintercept = fishing_v_ton_quant$median),color ="red", linetype ="dashed") +geom_vline(aes(xintercept = fishing_v_ton_quant$q75), color ="blue", linetype ="dashed") +annotate("text", x = fishing_v_ton_quant$q25, y =30, label =paste("25%", fishing_v_ton_quant$q25) , vjust =-1, size =3) +annotate("text", x = fishing_v_ton_quant$median, y =30, label =paste("50%",fishing_v_ton_quant$median), vjust =-1, size =3) +annotate("text", x = fishing_v_ton_quant$q75, y =30, label =paste("75%", fishing_v_ton_quant$q75), vjust =-1, size =3) +labs(title ="Distribution of Fishing Vessels by Tonnage",x ="Tonnage",y ="Frequency") +theme_minimal(base_size =8) +theme(axis.line =element_line(color ="black")) +coord_flip()# Adjust label to readable format # Adjust order so that the text is above the cargo_v_dist <-ggplot(cargo_v, aes(x = tonnage)) +geom_histogram(binwidth =500, fill ="#994C00", color ="black") +annotate("rect", xmin =-Inf, xmax =5000, ymin =-Inf, ymax =Inf, alpha =0, color ="#FFBF00") +annotate("rect", xmin =70000, xmax =Inf, ymin =-Inf, ymax =Inf, alpha =0, color ="#FFBF00") +geom_vline(aes(xintercept = cargo_v_ton_quant$q25), color ="blue", linetype ="dashed") +geom_vline(aes(xintercept = cargo_v_ton_quant$median), color ="red", linetype ="dashed") +geom_vline(aes(xintercept = cargo_v_ton_quant$q75), color ="blue", linetype ="dashed") +annotate("text", x = cargo_v_ton_quant$q25, y =30, label =paste("25%", cargo_v_ton_quant$q25), vjust =-1, size =3) +annotate("text", x = cargo_v_ton_quant$median, y =30, label =paste("50%",cargo_v_ton_quant$median), vjust =-1, size =3) +annotate("text", x = cargo_v_ton_quant$q75, y =30, label =paste("75%", cargo_v_ton_quant$q75), vjust =-1, size =3) +labs(title ="Distribution of Cargo Vessels by Tonnage",x ="Tonnage",y ="Frequency") +theme_minimal(base_size =8) +theme(axis.line =element_line(color ="black")) +coord_flip()#summary(fishing_v) - min: 100, q1: 600, median: 2400, q3: 4850, max: 17200 - investigate#summary(cargo_v) - min: 2100, q1: 2100, median: 23750, q3: 74925, max: 76300 - Investigate fishing_v_dist | cargo_v_dist
3.4 Exploring Dwell of Vessels of Interest
3.4.1: Dwell of unusually large fishing vessels which may point to over-fishing
Steps taken:
Filter the list of vessel by “Fishing” vessels and “Tonnage” > 12500 and identify which country and companies these vessel belong to
Plot the median dwell time that these unusually large fishing vessel spend at for the various geographical areas.
Visualisation Improvements:
Factoring the region to order and group these geographical areas by their nature (e.g., Beige for Island / Ports, Light blue for fishing region, light green for ecological preserves, navy for navigation buoys, remaining outlier of Exit East)
Insights:
2 vessels, namely “marinemarauder8c9” and “pikepirate89a” has Tonnage > 12,500
Lingering presence of “marinemarauder8c9” at ecological preserve of Nemo Reefs.
Distinct paths taken by the separate vessels, and reasonable time spent in authorised fishing regions.
However, “pikepirate89a” has persistently stayed in “Cod Table” prompting further investigation on potential over-fishing after mapping with cargo ids.
# understanding region that vessel spends time at abn_fish_v_activity <- ping_activity %>%filter(vessel_id %in% abn_fish_vessel$vessel_id) abn_fish_v_activity_sum <- abn_fish_v_activity %>%group_by(vessel_id, area) %>%summarise(median_dwell =median(dwell, na.rm =TRUE))# Contrasting with the median time spent by fishing vessels fish_v_activity <- ping_activity %>%filter(vessel_id %in% fishing_v$vessel_id) fish_v_activity_sum <- fish_v_activity %>%group_by(vessel_id, area) %>%summarise(median_dwell =median(dwell, na.rm =TRUE))unique(fish_v_activity$area)
[1] City of Haacklee City of Lomark City of Himark
[4] City of Paackland City of South Paackland Nav 3
[7] Nav D Nav B Nav A
[10] Nav C Nav 2 Nav 1
[13] Exit East Nav E Cod Table
[16] Ghoti Preserve Wrasse Beds Nemo Reef
[19] Don Limpet Preserve Tuna Shelf
21 Levels: Nav E Nav D Nav C Nav B Nav A Nav 3 Nav 2 Nav 1 ... City of Port Grove
Code
# assigning order to the plotfish_v_activity$area <-factor(fish_v_activity$area, levels =c("Nav E", "Nav D", "Nav C" , "Nav B", "Nav A" , "Nav 3", "Nav 2", "Nav 1", "Ghoti Preserve", "Don Limpet Preserve", "Nemo Reef", "Cod Table","Tuna Shelf","Wrasse Beds","Exit East", "City of South Paackland", "City of Paackland","City of Lomark","City of Himark","City of Haacklee","City of Port Grove"))# annotate area labels by color fishing_dwell <-ggplot(fish_v_activity, aes(x = area, y = dwell)) +annotate("rect", ymin =-Inf, ymax =Inf, xmin =15.5 , xmax =20.5, alpha =0.8, fill ="beige") +# Portsannotate("rect", ymin =-Inf, ymax =Inf, xmin =11.5, xmax =14.5, alpha =0.5, fill ="lightblue") +# Fishing groundannotate("rect", ymin =-Inf, ymax =Inf, xmin =8.5, xmax =11.5, fill ="lightgreen", alpha =0.5) +# Ecological preserveannotate("rect", ymin =-Inf, ymax =Inf, xmin =0.5, xmax =8.5, alpha =0.5, fill ="lightgrey") +geom_boxplot() +geom_point(data =subset(fish_v_activity, vessel_id =="marinemarauder8c9"),aes(x = area, y = dwell), color ="#66FFFF", size =1) +geom_point(data =subset(fish_v_activity, vessel_id =="pikepirate89a"),aes(x = area, y = dwell), color ="#009999", size =1) +theme_minimal(base_size =8) +labs(title ="Median Dwell Time by Area", x ="Area", y ="Median Dwell Time") +theme(axis.text.x =element_text(angle =45, hjust =1), axis.line =element_line(color ="black")) +scale_y_continuous(limits =c(0, 600000)) +coord_flip() fishing_dwell
Code
vessels_of_interest <-c("marinemarauder8c9", "pikepirate89a")vessel_colors <-c("marinemarauder8c9"="#66FFFF", "pikepirate89a"="#009999")# creating route for selected vessel vessel_trajectory_selected <- vessel_trajectory %>%filter(vessel_id %in% vessels_of_interest)# defining colors for X.kindkind_colors <-c("Island"="beige", "Fishing Ground"="lightblue", "Ecological Preserve"="lightgreen", "city"="purple", "buoy"="blue")large_fish_v_route <-ggplot() +geom_sf(data = oceanus_geog, aes(fill = X.Kind), color ="black") +scale_fill_manual(values = kind_colors) +geom_sf(data = vessel_trajectory_selected, aes(color = vessel_id), size =3) +scale_color_manual(values = vessel_colors) +geom_text(data = OceanusLocations_df, aes(x = XCOORD, y = YCOORD, label = Name), size =2, hjust =1, vjust =1) +theme_minimal() +labs(title ="Trajectories of South Seafood Express Corp", x ="Longitude", y ="Latitude", color ="Vessel ID")large_fish_v_route
3.4.2: Dwell of small (<5000 tons) and unusually large Cargo Vessels > 75,000 tons)
Possibility: Illegal unauthorised fishing with small size Cargo Vessels by foreign countries.
Steps taken & visualisation improvement: Similar to 3.4.1
Insights derived:
High presence in “Unrecognised” ping locations.
Substantial number of cargo vessels with comparable size as fishing vessel median tonnage, and dwell time spent at each are relatively varied. Further analysis will be done after matching of vessels to cargo.