Article Extraction

Last modified November 10, 2022

GTxcel will extract article text from the pdfs that you supply to create an HTML version of your articles. The purpose of this process is to provide a responsive and easy to read version of your articles for your users, specifically users using mobile devices.

*Note: This process is largely automated. How the pdf is created, including layering and embedded fonts, will affect the outcome of the extraction. Also, the design needs tend to be different from a highly stylized print layout and a pure article text layout. For these reasons, it will be necessary for you to review the articles and correct any abnormalities or design elements that the automated extraction process did not create to your particular needs.

Article Extractions include:

  • All images in an article, including their caption text.
  • Tables, charts, graphs, and figures.
  • The embedded text of the pdf you provide will be extracted so all text will appear exactly as it was created in the pdf.
  • (Optional) Extraction of Full page and Fractional ads.

Here are several items to keep in mind when creating your pdfs, as how the pdf is created will directly affect the quality of the article extraction output.

  • Embedded Text and Fonts – All text and fonts should be embedded text. Text that is an image such as a vector graphic or bitmap graphic, rather than embedded text, will not be extracted as text.
  • Line breaks – Line breaks should be used to create the end of a paragraph or force the end of a line at a desired point. If line breaks are not used, text may string together that was originally intended to display separately. Popular examples of when you want to ensure line breaks are used include lists and poems.
  • Layers – If layers are used, the text layer should be the top layer.
Need Help?
The Digital Help Desk is the process for communicating with GTxcel regarding new title setups, questions, and technical issues for the Web Reader and/or Apps.

You can submit a request to us through the Request Help button located in the Publisher Dashboard or call the support number: 800-609-8994, option 3.
Contact Us GraphicContact Support
8AM - 5PM EST
Monday to Friday
800-609-8994, option 3
Response Times
General Question/Requests – A Customer Success team Member will begin working on your request within one to two hours of receipt. We will complete the request as soon as possible; we aim to have all requests completed within 24 hours.