Re: Reading The Text of PDF files
Re: Reading The Text of PDF files
- Subject: Re: Reading The Text of PDF files
- From: p3consulting <email@hidden>
- Date: Wed, 1 Oct 2003 17:18:18 +0200
Le mercredi, 1 oct 2003, ` 16:08 Europe/Brussels,
email@hidden a icrit :
>
Message: 12
>
Date: Wed, 01 Oct 2003 15:33:41 +0200
>
Subject: Reading The Text of PDF files
>
From: Lorenzo <email@hidden>
>
To: <email@hidden>
>
>
Hi,
>
how can I get the pure text from a PDF file?
>
>
>
Best Regards
>
--
>
Lorenzo
>
email: email@hidden
ps2ascii is one solution
Part of Ghostscript tools suite (
http://www.ghostscript.com/,
http://www.gnu.org/software/ghostscript/ghostscript.html)
NAME
ps2ascii - Ghostscript translator from PostScript or PDF
to ASCII
SYNOPSIS
ps2ascii [ input.ps [ output.txt ] ]
ps2ascii input.pdf [ output.txt ]
DESCRIPTION
ps2ascii uses gs(1) to extract ASCII text from
PostScript(tm) or Adobe Portable Document Format (PDF)
files. If no files are specified on the command line, gs
reads from standard input; but PDF input must come from an
explicitly-named file, not standard input. If no output
file is specified, the ASCII text is written to standard
output.
ps2ascii doesn't look at font encoding, and isn't very
good at dealing with kerning, so for PostScript (but not
currently PDF), you might consider pstotext (see below).
FILES
Run "gs -h" to find the location of Ghostscript documenta-
tion on your system, from which you can get more details.
SEE ALSO
pstotext(1),
http://www.research.digital.com/SRC/virtual-
paper/pstotext.html
VERSION
This document was last revised for Ghostscript version
7.04.
AUTHOR
L. Peter Deutsch <email@hidden> was the original
author. The current version has substantial improvements
by David M. Jones <email@hidden>.
7.04 31 January 2002 PS2ASCII(1)
Pascal Pochet
P3 Consulting
email@hidden
http://www.p3-consulting.net
_______________________________________________
cocoa-dev mailing list | email@hidden
Help/Unsubscribe/Archives:
http://www.lists.apple.com/mailman/listinfo/cocoa-dev
Do not post admin requests to the list. They will be ignored.