Weblog clients cannot retrieve posts: An invalid hexadecimal character (0x7) was found in the element content of the document

Error blog: www.DannyQuah.com/writings/en/

Problem: Unable to retrieve posts. Reproduced with Weblog clients Windows Live Writer, ScribeFire, and Qumana. The error message on the invalid hexadecimal character was recovered from WLW error logfile and Qumana (screenshot below); and I believe to apply also to ScribeFire (but I was unable to find logs for this). Inexplicably, Blogilo under Ubuntu Linux 14.04 is able to retrieve all posts.

My other blogs

www.DannyQuah.com/writings/zh/

www.DannyQuah.com/writings/technical/

www.DannyQuah.com/writings/1s/

have all posts retrievable by all the clients mentioned, and show no error.

My question: Is there a way to isolate where the invalid hexadecimal character appears? A previous related post, https://wordpress.org/support/topic/windows-live-writer-unable-to-retrieve-posts although marked resolved, never found where the error was, and I can't tell if the error appears in the post content or in the db. I'm opening this thread as I think the error is in the WP content (db, post content, etc.), not in the WLW or other clients. I can't find a single file or directory to run a grep on for this.

Additional information: This earlier post https://wordpress.org/support/topic/windows-live-writer-unable-to-retrieve-posts describes editing the db to fix this, but I'm wary of doing that if I don't know the source of the error (the problem might just re-appear again if it was introduced from WP software). I have no Plugins activated. I have tried setting the Theme to be the Default one (both 2015 and 2012) - the problem remains.

Related posts: https://wordpress.stackexchange.com/questions/175841/windows-live-writer-cannot-retrieve-posts-although-blogilo-can

Topic xml-rpc windows-live-writer Wordpress

Category Web


The amazing Rarst solved this. She suggested I run my site through validator.w3.org. Amid the other W3C errors, validator finds invalid hexadecimal characters. Chrome won't show them (but Rarst says Opera does). I copied the sections over to my GNUemacs window, and sure enough they all appeared. Going through these cleared up all my problems and now WLW, ScribeFire, Qumana, and likely all other weblog clients work for me.

Many, many thanks, Rarst!


If you run your page through validator it detects said broken characters as errors:

Line 107, Column 116: non SGML character number 7 …or growth and convergence. Historians (Kennedy 1989) and international relat…

I can actually see them on page in Opera v12, but not Chrome.

Since your page source and web server advertise UTF-8, my guess is that these very well might be corrupt and saved as such in your content in database.

It might have been caused by migration or serve misconfiguration. Unfortunately there is no generic solution to this, or much that WP can help with to resolve it. You might end up to have to just clean it up manually or semi–manually with help of search/replace tools.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.