Hi
I am working with a client where they use a lot of Word Documents internally and then tend to copy from Word into the CMS and then we have the usual woes of it adding extra markup, css classes and inline CSS styles.
With us upgrading the project to 16, we now have TipTap as the RichText editor built into CMS and I need to be able to try and clean the markup.
With TinyMCE we had a plugin previously that would help do this for us.
Has anyone so far managed to get this to work or clean contents coming from Word nicely when using TipTap as the Rich Text Editor?
Well I have found some part I can use in TipTap to help replace/update when you paste content in with PasteRules
Example
The example looks for the word SiteCore and replaces it with Umbraco however I have not been able to get it to remove inline CSS styles and remove <o:p> style stags etc
manifest.ts
export const manifests: Array<UmbExtensionManifest> = [
{
"name": "Tiptap Cleanup Pasted Content",
"alias": "Tiptap.CleanupPastedContent",
"type": "tiptapExtension",
"meta": {
"group": "Custom",
"label": "Cleanup Pasted Content",
"description": "Cleans up content pasted into the editor by removing unwanted styles and elements.",
"icon": "icon-clipboard-paste",
},
api: () => import('./cleanup-pasted-content/cleanup-pasted-content'),
}
];
cleanup-pasted-content.ts
import { UmbTiptapExtensionApiBase } from '@umbraco-cms/backoffice/tiptap';
import { CleanupPastedContentExtension } from './cleanup-pasted-content-extension';
export default class CleanupPastedContentTipTapExtensionApi extends UmbTiptapExtensionApiBase {
getTiptapExtensions = () => [CleanupPastedContentExtension];
}
cleanup-pasted-content-extension.ts
import { Extension, textPasteRule } from '@umbraco-cms/backoffice/external/tiptap';
export const CleanupPastedContentExtension = Extension.create({
name: 'cleanupPastedContentExtension',
addPasteRules() {
return [
textPasteRule({
find: /\bSitecore\b/gi, // Regex find any word "Sitecore" case insensitive
replace:'Umbraco' // Dumb replacement for testing
}),
]
}
});
So for a simple thing like the documentation shows on replacing a string such as markdown style highlighted text it can do replacements as needed.
But if anyone has any smart ideas or thoughts please let me know.
For now back to more experimenting…
In v16.3, (PR #20042), you’ll be able to disable the Tiptap capabilities in the RTE data-type config, e.g. remove “class” and “style” attributes. But this would mean that some of the other extensions like Text Color, Font Size, etc wouldn’t work.
With the <o:p> tags, from what I’ve tested, the o namespace prefix should get stripped out, leaving only the <p> tags, but this may depend on the document’s content being pasted in.
I like @huwred’s suggestion with the dedicated extension, that looks to handle all the MS Office markup scenarios.
From some simple tests I done, copying from Word desktop app behaves a lot better than it ever used to do, but copying from Word Online in a browser tab will tend to bring alot of noise along for the ride.
I am going to try out the extension with the client & their content editors to see if it gets all the problems solved or not.