How to make smaller categories with factor character variables

I have this data set with consist of ISO3166 Alpha-2 codes for countries. Example: DE, AD, AE etc They are coded as factor variables in R and there are about 173 observations.

Now because there are too many and this would just overwhelm a boxplot, I want to make a contingency table with other variables by condensing the codes and create shorter categories (also coded as factors) with the codes, for example, having

DE, RE, ED, FR-> Europe

CA, US-> North America

VF, HG, HY, TY-> South America

HG, TY, UT,FT -> Africa

How can I do that because I tried a few things that did not work.

Thank you!

P.S These codes are made up codes but it is just to illustrate.

Topic data-table rstudio dataset data-cleaning r

Category Data Science


While you are trying this in R, I have a solution in Pyhton. You could use a similar logic in R:

# Say this is your list:
countries = ['IN', 'DE', 'US', 'UK']

# Create a list of countries in different continents like this:
asia = ['IN', 'NP']
america = ['US', 'CD']
europe = ['DE', 'UK', 'FR']

# You can then map your list with these values using a definition:
for i in range(len(countries)):
    if countries[i] in asia:
        countries[i] = 'Asia'
    elif countries[i] in america:
        countries[i] = 'America'
    elif countries[i] in europe:
        countries[i] = 'Europe'
    else:
        countries[i] = 'Others'

# Now check your list
countries
['Asia', 'Europe', 'America', 'Europe']

Once this list is ready, you could create your boxplot.

Hope this helps.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.