Did Gutenberg block editor change the html in post content during import?

Which HTML is better, pre-Gutenberg or post-Gutenberg?

I imported post content from an old and large WP site into a fresh install and new database. Almost 1200 posts along with their meta and related media. The xml file is 15mb. I used the standard wordpress import/export along with a media export [plugin][1] for featured images. The origin site uses tinymce advanced to maintain the classic editor look for the client.

Most everything carried over, but in the new setup, the html of the imported content changed.

Here's how it looked on the front-end. left is origin, right is import.

Here is how the html changed. Left is the origin. Right is after import.

At what point in the process did the entire content block get wrapped in p’s, with br’s and nbsp’s added and original line breaks missing?

Apparently, something about Gutenberg is changing the basic html of post content.

https://wordpress.org/support/topic/gutenberg-does-not-play-nicely-with-code-editor/

https://github.com/WordPress/gutenberg/issues/11211

https://core.trac.wordpress.org/ticket/45636?cversion=0cnum_hist=1

I was able to fix the front-end paragraph spacing issue with this css courtesy of Themeisle

    br
{   content: "A" !important;
    display: block !important;
    margin-bottom: 1em !important;
}

Even so, I want to know how to best move forward in terms of the html. Is this some kind of bug with Gutenberg or should I just go ahead with the CSS fix? What is the proper html here? if it's wrong, is there some kind of regex magic that would fix it?

update

In response to @tomjnowell, I ran the export with no active plugins and imported it into a fresh installation with no active plugins. Here are the results:

The xml was imported into the blank site. The post content then appears in a Gutenberg "Classic" block in "edit as html" mode with the space between parargraphs removed and br's and nbsp's added. BR's do not appear in the database, but nbsp's do.

Here is another comparison showing from left to right, the XML, the origin db and the destination db. Also, I noticed the following differences between the databases. Not sure if it matters.

Origin db post_content has as type: MyISAM with collation set as: latin1_swedish_ci

Destination db post_content has as type: InnoDB with collation set as: utf8mb4_unicode_520_ci

Topic block-editor tinymce Wordpress

Category Web


Gutenberg is an editor, it's not an active process that runs in the background. Unless you opened the post in the block editor there is no way for it to modify the HTML

Think of it this way, a PDF is a PDF, it's content, data. If I create it with one application, then copy it to a machine that has a different PDF reader, the PDF is still the same. To modify it I would have to open the PDF in an editor and resave it.

The same is true of Gutenberg. Unless you open a post in gutenberg and press save/publish, there are zero opportunities for Gutenberg to make modifications.

Try to imagine what would be involved for Gutenberg to actually do this. It would need to update every single post in the database, track what it had and hadn't modified, and it couldn't do this in a single request, it would have to continuously poll itself until the job was done, creating major server load.

So gutenberg is not the cause of your issue. It can't be, unless you opened all 1200 posts then edited and resaved every single one of them.

So what can you do?

First, do a standard WP export, without any additional plugins interfering or meddling.

Second, check the markup in the wxr and see if it has the same content or if it's been modified.

Third, import the data, and check the markup in the database, not the editor/frontend.

Fourth, do all of the above with the standard default themes and no plugins.

Put simply, there is no evidence Gutenberg is at fault ( there's no evidence of what is at fault, the cause is currently unknown ), and Gutenberg doesn't work that way.


As a final note, what you're referring to is not normal content, and those aren't paragraphs. Those are TinyMCE Advanced "classic paragraph" blocks, and they may not work the same way. For example, it might be that how the content looks and what the actual HTML is are not the same.

I'd suggest making sure TinyMCE Advanced is also installed on the target site. Otherwise you should ask for TinyMCE Advanced support at the TinyMCE Advanced support forum

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.