You are here:Home»KB»Web Design»CMS»Joomla»How to Convert Joomla Articles into Word Documents
Sunday, 12 June 2016 16:09

How to Convert Joomla Articles into Word Documents

Written by

This for me is a very important process to know. For my longer articles I mainly create them in office and then copy and then paste them in to my WYSIWYG (JCE Editor). This method allows me to write the articles quicker without fear of time outs and lost content etc.

With putting most of my articles on my website there is no need to keep the word documents as this would get tiresome and could be confusing. There is a downside to this. If you want any of the articles in a word format you no longer have the original and the article has been converted in to a HTML page.

Until now I did not know how to convert the Joomla articles back to Word Documents and i was use to that when I copied the document back from the website and pasted into Word 2010 the results were undesireable and not how I remembered the original word document, the formatting could be loast, the wrong font was being used and the heading references were missing.

I will now show you how to completly restore a Word document from a Joomla article. You should note that this will work with other content such as K2 articles.

 

Different Methods I have tried

  • Pasting into LibreOffice Writer 5.1.3 (Prefered Method)
  • Create HTML file with the HTML from the article WYSIWYG and then open that file with Libre Office
  • Pasting into Word 2010
  • Create a HTM file from the browser and then open with Microsoft Word 2010
  • Conversion Websites

 

Methods and their results:

Pasting into Libre Office Writer 5.1.3 (Prefered Method)

  1. Goto the articles webpage, do not edit the article.
  2. Copy the article's content only, not the menu, statistics, modules etc.... (If you need the title which is not in the content because it is managed by the menu item, type that in later.)
  3. Create a blank LibreOffice Writer document and open it.
  4. Paste (Ctrl+V) the content into LibreOffice Writer
    • Ctrl+V seems to do a paste, 'HTML without comments'.
    • After pasting into LibreOffice Writer there are no weired characters. Layout, lists and headings are maintained but the fonts have been changed to the default LibreOffice style.
  5. Save the file as Word .docx file
  6. Open the file in Word 2010
    • The layout and heading references have been maintained. There are no weired characters but the styling is still the sames as was in LibreOffice.
  7. Select all of the content
  8. Change style to word 2010
  9. Add the main <H1> Title if needed
  10. Save the document
  11. Done

You now have a word version of your HTML article. The headings and formatting is maintained. You can also paste the word document back into a WYSIWYG with no further issues.

Notes

  • I have not tested this method with images but should work.
  • This method seems to remove all weired chracters, possibly has an invisible font substitution for this exact issue.
  • I would guess the results by using the WYSIWYG content (not HTML) will be exactly the same.

 

Create HTML file with the HTML from the article WYSIWYG and then open that file with Libre Office

  1. Edit the article in Joomla
  2. Expose the HTML and copy it
  3. Create a article.html file on your PC's desktop and paste the code into it
  4. Save and close the article.html file
  5. Open the article.html file with LibreOffice Writer
  6. Save the content as a word .docx file (i.e. article.docx)
  7. Open article.docx in Word 2010
  8. Select all of the content
  9. Change style to word 2010
  10. Add the main <H1> Title if needed
  11. Save the document
  12. Done

You now have a word version of your HTML article, there might be a few things you need to sort out but the headings and formatting is maintained. You can also paste the word document back into a WYSIWYG with no further issues.

Notes

  • It is important to only use the HTML of the article.
  • I have not tested this with images, they possibly will fail unless using full http:// paths.
  • When I open up the article.docx file in LibreOffice, some characters appear as weired characters, not a lot. This is due to the characters being non-standard from the original Microsoft Office Word document (i.e look like periods but are not, or those dashes that you get in Word but are the ones Word autocorrect adds). These characters go unnoticed because they render correctly on the webpage (or appear to) but are slightly different characters. When LibreOffice opens the HTML file it substitutes these characters with characters from the ‘SimSun’ font giving rise to what appears to be corruptions but is infact just an unpleasant font substitution. To correct this you can:
    • Makes sure there are no errors in the original word document. It is the auto correct thing that most likely causes this
    • Remove all non standard code in the article (joomla/wysiwyg)
    • Tidy the text up inlibre office before saving
    • Tidy the text up in Microsoft office

 

Pasting into Word 2010

This keeps the formatting from the website but does not correctly attach heading status to the headings even though they are in the nav pane. You can merge the styles which seems to restore the fonts but not the heading.

If you use this method you will have to use the 'Change Style' feature and manually set the headings again. There is also a chance that there will be font settings in the document that will be copied back in to a Joomla rticle should you want to try and use this code back in the joomla article, perhaps after editing it.

 

Create a HTM file from the browser and then open with Microsoft Word 2010

  • Using FireFox I saved the articles webpage as htm and then open with word.
    • This keeps the headings correctly referenced
    • This is messy as it keeps all of the other stuff like the menu and modules.
    • You can then delete the uneeded bits by highlighting and deleting
    • Select all and go to ‘Change Styles’ and select ‘Word 2010’ which makes the headings look normal. The text still stays in the wrong font and size
    • This would cause specified font rules being copied back up to a Joomla article should you want to put the article back into Joomla.

 

Conversion Websites

  • These converters will try and keep the content the same as on the website by setting fonts and sizes rather than maintaining the elements such as converting <h1>, <h2> to a heading elements in Word.
  • So these converters will translate the content fine into the Word equivalent visually but there will be element recognition loss and font substitution.
  • Because of the loss of headings I would have to manually set all the headings again and then use the 'Change Style' feature. Teh results of this will vary from converter to converter.
  • Because I want to create a word document that only uses the default so there is no font settings within the document, this method is not good.

 

General Notes

  • They might be stuff that cannot be translated properly like code highlighting, but these can be sorted out manually as they most likely never existed in a word document anyway

  • Office 2016 - This might be more forgiving for direct pasting of content as it is a lot newer and because HTML and document formats probably have more parity

Read 2773 times Last modified on Sunday, 12 June 2016 17:28