Stripping unicode characters out of slug

I am trying to strip the following characters out of slugs: ṁ, ṭ, ḍ, ṇ, ṅ, ñ, ḷ, ṃ.

I found this code here (note: I have removed public in order to get it to work)

add_action('wp_insert_post_data', __NAMESPACE__ . 'processPermalink');
/**
 * Processes the permalink so we can remove any characters that may cause a problem when communicating
 * with the API.
 *
 * @param  array $data The array of information about the post.
 * @return array $data The data without the malformed information in the post name for the URL.
 */
function processPermalink($data)
{
    if (!in_array($data['post_status'], array('draft', 'pending', 'auto-draft'))) {
        $data['post_name'] =
            preg_replace(
                '/(%ef%b8%8f|™|®|©|trade;|reg;|copy;|#8482;|#174;|#169;)/',
                '',
                $data['post_name']
            );
    }
    return $data;
}

I have tried replacing the preg_replace in these three ways, but none of them work (the original code does what it shoudl):

'/(#8424;#8424;#8424;#7745;|#7789;|#7693;|#7751;|#7749;|#241;|#7735;|#7747;)/'
'/(#x1E41;|#x1E6D;|#x1E0D;|#x1E47;|#x1E45;|ntilde;|#x1E37;|#x1E43;)/'
'/(ṁ|ṭ|ḍ|ṇ|ṅ|ñ|ḷ|ṃ)/'

None of them successfully strip out the characters.

EDIT: I don't actually have to accomplish my goal using the above code. I'd be happy to do it some other way, including substituting the bad characters with their non-accented versions.

Any ideas? Thanks!

Topic slug Wordpress

Category Web


'/(ṁ|ṭ|ḍ|ṇ|ṅ|ñ|ḷ|ṃ)/' is a valid regex to process what you want.

$string = '/(ṁ|ṭ|ḍ|ṇ|ṅ|ñ|ḷ|ṃ)/ is my regex and my chosen string is: ṁ, ṭ, ḍ, ṇ, ṅ, ñ, ḷ, ṃ';


echo preg_replace(
    '/(ṁ|ṭ|ḍ|ṇ|ṅ|ñ|ḷ|ṃ)/',
    '',
    $string,
);

// Returns '/(|||||||)/ is my regex and my chosen string is: , , , , , , , '

EDIT: So the issue is that the slug had already been urlencoded at this point.

Try this

add_filter( 'wp_insert_post_data', 'wpse_406105_process_permalink', 10, 1 );
/**
 * Processes the permalink so we can remove any characters that may cause a problem when communicating
 * with the API.
 *
 * @param  array $data The array of information about the post.
 * @return array $data The data without the malformed information in the post name for the URL.
 */
function wpse_406105_process_permalink( $data ) {
    if ( ! in_array( $data['post_status'], array( 'draft', 'pending', 'auto-draft' ) ) ) {
        $data['post_name'] =
            preg_replace(
                '/(%E1%B9%81|%E1%B9%AD|%E1%B8%8D|%E1%B9%87|%E1%B9%85|%C3%B1|%E1%B8%B7|%E1%B9%83)/i',

                '',
                $data['post_name']
            );
    }
    return $data;
}

You don't want a namespace there, you have converted it to make it a procedural function. It's a good idea to make procedural functions fairly uniquely named to avoid conflicts so I have also changed the function from processPermalink().

It's also worth noting that these do not function until you have published the post - drafts, auto drafts and pending posts are skipped.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.