How do I test one-shot model preformance against flawed categories?

Question

How do I test one-shot model preformance against flawed categories?

hrokr

2022年6月3日 02:41

I'm in the process of reworking the ASAM database. Excerpted, it looks like this:

4155    PIRATES     BULK CARRIER    GULF OF ADEN: Bulk carrier fired upon 3 Aug 09 at 1500 UTC while underway in position 13-46.5N 050-42.3E. Ten heavily armed pirates in two boats fired upon the vessel underway. The pirates failed to board the vessel due to evasive action taken by the master. All crew and ship properties are safe (IMB).
4156    PIRATES     CARGO SHIP      NIGERIA: Vessel (SATURNAS) boarded, crewmembers kidnapped 3 Aug 09 while operating in the vicinity of the Escravos River in the western Niger Delta. The Lithuanian-flagged vessel came under attack by an unidentified group in a high-speed boat, according to a statement by the Lithuanian Foreign Ministry. The men kidnapped five of the crewmembers, taking them to an unknown location, but left the remaining nine crewmembers onboard. No crewmembers were reported injured, while the vessel sustained no damage (Bloomberg, LM: Topnews.in).
4157    PIRATES     BULK CARRIER    BONNY RIVER PORT HARCOURT NIGERIA: Heavily armed pirates in two speedboats, seven in each boat approached and opened fired on a bulk carrier at anchor. The vessel immediately heaved anchor and proceeded to open seas for safety reasons. One crew member injured.
4158    PIRATES     CONTAINER SHIP  75NM OFF MIRI, SARAWAK, MALAYSIA: Twelve pirates, in a seven meter long, unlit boat approached a container ship underway. They chased the ship and tried to get alongside. Alarm raised, took evasive maneuvers, alerted crewmembers and master fired rocket flares. Pirates aborted the attept.
4159    PIRATES     BULK CARRIER    BRAZIL: Bulk carrier robbed 27 Jul 09 at 2355 local time while anchored in position 01-05.41S - 048-29.08W, Mosqueiro anchorage. Robbers armed with knives boarded the vessel at anchor. They tied up the watch officer's hands and stole ship's stores before escaping (IMB).
4160    PIRATES     MERCHANT VESSEL SWEDEN: Vessel (ARCTIC SEA) boarded 24 Jul 09 at approximately 0300 local time while underway in Swedish territorial waters. Eight to twelve armed men allegedly wearing masks and uniforms bearing the word police¿ boarded the vessel using a black rubber boat. While onboard, the men allegedly assaulted and tied up many of the crew members. During this time, crew members were questioned about drug trafficking. The men stayed onboard the vessel for approximately 12 hours, during which time they rummaged through the cabins. The crew members were eventually released and the men departed the vessel. It's unknown if anything was stolen. Swedish police are currently investigating the incident. UPDATE: On 4 Aug 09, the vessel was expected to make a port call in Algeria, but never arrived. Russian authorities are investigating the incident and have begun searching for the vessel. There is no further information to provide at this time (AP, Bloomberg, LM: Times of Malta).

There are currently nine categories of hostilities (which is sub-optimal but another issue)

Attempted Boarding
Blocking
Boarding
Fired Upon
Hijacking
Kidnapping
Hijacking/Kidnapping Combination
Robbery
Suspicious Approach

You'll notice piracy isn't a category of hostility; robbery is usually the closest category that applies. However, in the above section, you would have:

4155 as fired upon
4156 as kidnapping
4157 as fired upon
4158 would be correct as robbery
4159 as boarding.

So, I've turned to running NLP models on the description to correctly categorize the hostilities. One-shot models are very promising and, from a cursory inspection, they seem pretty accurate and perfect for this sort of case. However, I don't know how to test model performance because the current categorization is so deeply flawed. I previously did a K-means which was essentially useless.

What is the best way to quantify the performance of the models? My best guess is to do a subset of the database by hand and check the models against that but this doesn't seem like a good way.

Topic one-shot-learning accuracy nlp

Category Data Science

How do I test one-shot model preformance against flawed categories?

About