Article Extraction

Last modified March 25, 2020

GTxcel will extract article text from the PDFs that you supply to create a text only version of your articles. The purpose of this process is to provide an easy to read version of your articles for readers on mobile devices. Extracting articles is included in the Total Mobility pricing. If a web-only title would like article extraction, please contact your Account Manager for pricing.

Note: The text extraction process will give publishers a start on creating the article text view.  However, this process is not precise.  How the PDF is created, including layering and embedded fonts, will affect the outcome of the article extraction.  Also, the design needs tend to be different from a highly stylized print layout and a pure article text layout.  For these reasons, it will be necessary for the publisher to review the articles and correct any abnormalities or design elements that the automated extraction process did not create to your particular needs.

Article Extractions include:

  • All images in an article, including their caption text
  • Tables, charts, graphs, and figures
  • The embedded text of the PDF you provide will be extracted so all text will appear exactly as it was created in the PDF.

Here are several items to keep in mind when creating your PDFs.  How the PDF is created will directly affect the quality of the article extraction output.

  • Embedded Text and Fonts – All text and fonts should be embedded text.  Text that is an image such as a vector graphic or bitmap graphic, rather than embedded text, will not be extracted as text.
  • Line breaks – Line breaks should be used to create the end of a paragraph or force the end of a line at a desired point.  If line breaks are not used, text may string together that was originally intended to display separately.  Popular examples of when you want to ensure line breaks are used include lists and poems.
  • Layers – If layers are used, the text layer should be the top layer.
Need Help?
The Digital Help Desk is the process for communicating with GTxcel regarding new title setups, questions, and technical issues for the Web Reader and/or Apps.

You can submit a request to us through the Request Help button located in the Publisher Dashboard or call the support number: 800-609-8994, option 3.
Contact Us GraphicContact Support
8AM - 5PM EST
Monday to Friday
800-609-8994, option 3
Response Times
General Question/Requests – A Digital Specialist will begin working on your request within one to two hours of receipt. We will complete the request as soon as possible; we aim to have all requests completed within 24 hours.