downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

pdf_activate_item> <preg_split
[edit] Last updated: Mon, 01 Nov 2010

view this page in

CXII. Funciones PDF

Introducción

Las funciones PDF en PHP pueden crear archivos PDF utilizando la biblioteca PDFlib creada por Thomas Merz.

La documentación en esta sección solamente es una descripción de las funciones de la biblioteca PDFlib y no debería considerarse una referencia exhaustiva. Se ha de consultar la documentación incluida en el código fuente de la distribución de PDFlib para una completa y detallada explicación de cada función. Proporciona muy buena descripción de las capacidades de PDFlib y contiene actualizada la documentación de todas las funciones.

Todas las funciones de PDFlib y del módulo PHP tienen nombres iguales para las funciones y parámetros. Se necesitará entender algunos de los conceptos básicos de PDF y PostScript para un eficiente uso de esta extensión. Todas las longitudes y coordenadas se mesuran en puntos PostScript. Generalmente hay 72 puntos PostScript por pulgada, pero esto depende de la resolución de salida. Se puede consultar la documentación incluida en la distribución de PDFlib para una detallada explicación del sistema de coordenadas utilizado.

Hay que tener en cuenta que la mayoría de las funciones PDF requieren un primer parámetro pdfdoc. En los siguientes ejemplos hay más información.

Nota: Si se está interesado en alternativas de generadores gratis de PDF que no utilizen liberías externas PDF, mirar este FAQ relacionado.

Requisitos

PDFlib está disponible para descargar en http://www.pdflib.com/products/pdflib/index.html, pero requiere la compra de una licencia para uso comercial. Se requieren las bibliotecas JPEG y TIFF para compilar esta extensión.

Compatibilidad con versiones antiguas de PDFlib

Cualquier versión de PHP después del 9 de Marzo del 2000 no soporta versiones de PDFlib anteriores a la 3.0.

PDFlib 3.0 o superior es compatible desde PHP 3.0.19 en adelante.

Instalación

Esta extension PECL no esta ligada a PHP. Mas informacion sobre nuevos lanzamientos, descargas ficheros de fuentes, informacion sobre los responsables asi como un 'CHANGELOG', se puede encontrar aqui: http://pecl.php.net/package/pdflib.

To get these functions to work in PHP < 4.3.9, you have to compile PHP with --with-pdflib[=DIR]. DIR is the PDFlib base install directory, defaults to /usr/local.

As of PHP 4.3.9, you must install this extension through PEAR, using the following command: pear install pdflib.

Configuración en tiempo de ejecución

Esta extensión no tiene directivas de configuración en php.ini.

Confusiones con antiguas versiones de PDFlib

Desde PHP 4.0.5, la extensión PHP para PDFlib es oficialmente soportada por PDFlib GmbH. Esto significa que todas las funciones descritas en el manual de PDFlib (V3.00 o superior) son soportadas por PHP 4 con el mismo funcionamiento y parámetros. Sólo los valores devueltos pueden variar en el manual PDFlib, ya que PHP adoptó la convención de devolver FALSE. Por razones de compatibilidad, PDFlib aún soporta las antiguas funciones, pero deberían reemplazarlas en sus nuevas versiones. PDFlib GmbH no dará soporte a cualquier problema causado por el uso de estas funciones obsoletas.

Tabla 1. Funciones obsoletas y sus reemplazos.

Antigua funciónReemplazo
pdf_put_image()Ya no se necesita.
pdf_execute_image()Ya no se necesita.
pdf_get_annotation()pdf_get_bookmark() utilizando los mismos parámetros.
pdf_get_font()pdf_get_value() pasando "font" como segundo parámetro.
pdf_get_fontsize()pdf_get_value() pasando "fontsize" como segundo parámetro.
pdf_get_fontname()pdf_get_parameter() pasando "fontname" como segundo parámetro.
pdf_set_info_creator()pdf_set_info() pasando "Creator" como segundo parámetro.
pdf_set_info_title()pdf_set_info() pasando "Title" como segundo parámetro.
pdf_set_info_subject()pdf_set_info() pasando "Subject" como segundo parámetro.
pdf_set_info_author()pdf_set_info() pasando "Author" como segundo parámetro.
pdf_set_info_keywords()pdf_set_info() pasando "Keywords" como segundo parámetro.
pdf_set_leading()pdf_set_value() pasando "leading" como segundo parámetro.
pdf_set_text_rendering()pdf_set_value() pasando "textrendering" como segundo parámetro.
pdf_set_text_rise()pdf_set_value() pasando "textrise" como segundo parámetro.
pdf_set_horiz_scaling()pdf_set_value() pasando "horizscaling" como segundo parámetro.
pdf_set_text_matrix()Ya no se necesita.
pdf_set_char_spacing()pdf_set_value() pasando "charspacing" como segundo parámetro.
pdf_set_word_spacing()pdf_set_value() pasando "wordspacing" como segundo parámetro.
pdf_set_transition()pdf_set_parameter() pasando "transition" como segundo parámetro.
pdf_open()pdf_new() más la subsecuente llamada de pdf_open_file()
pdf_set_font()pdf_findfont() más la subsecuente llamada de pdf_setfont()
pdf_set_duration()pdf_set_value() pasando "duration" como segundo parámetro.
pdf_open_gif()pdf_open_image_file() pasando "gif" como segundo parámetro.
pdf_open_jpeg()pdf_open_image_file() pasando "jpeg" como segundo parámetro.
pdf_open_tiff()pdf_open_image_file() pasando "tiff" como segundo parámetro.
pdf_open_png()pdf_open_image_file() pasando "png" como segundo parámetro.
pdf_get_image_width()pdf_get_value() pasando "imagewidth" como segundo parámetro y la imágen como tercer parámetro.
pdf_get_image_height()pdf_get_value() pasando "imageheight" como segundo parámetro y la imágen como tercer parámetro.

Ejemplos

La mayoría de las funciones son bastante fáciles de utilizar. La parte más difícil probablemente es la creación de un primer documento PDF. El siguiente ejemplo debería ayudar para comenzar. El ejemplo crea el archivo test.pdf en una página. La página contiene el texto "Times Roman outlined" en un contorno, con fuente de 30pt. El texto también está subrayado.

Ejemplo 1. Creando un documento PDF con PDFlib

<?php
$pdf
= pdf_new();
pdf_open_file($pdf, "test.pdf");
pdf_set_info($pdf, "Author", "Javier Tacon");
pdf_set_info($pdf, "Title", "Test for PHP wrapper of PDFlib 2.0");
pdf_set_info($pdf, "Creator", "See Author");
pdf_set_info($pdf, "Subject", "Testing");
pdf_begin_page($pdf, 595, 842);
pdf_add_outline($pdf, "Page 1");
$font = pdf_findfont($pdf, "Times New Roman", "winansi", 1);
pdf_setfont($pdf, $font, 10);
pdf_set_value($pdf, "textrendering", 1);
pdf_show_xy($pdf, "Times Roman outlined", 50, 750);
pdf_moveto($pdf, 50, 740);
pdf_lineto($pdf, 330, 740);
pdf_stroke($pdf);
pdf_end_page($pdf);
pdf_close($pdf);
pdf_delete($pdf);
echo
"<A HREF=getpdf.php>finished</A>";
?>
El script getpdf.php simplemente devuelve el documento pdf.

Ejemplo 2. Mostrando un documento PDF precalculado

<?php
$len
= filesize($filename);
header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=foo.pdf");
readfile($filename);
?>

La distrubución PDFlib contiene un ejemplo más complejo para crear un reloj analógico en una página. Aquí se utiliza el método de creación en memoria de PDFlib para no tener que crear un archivo temporal. El ejemplo se ha convertido a PHP desde uno de PDFlib (El mismo ejemplo está disponible en la documentación ClibPDF.)

Ejemplo 3. Ejemplo pdfclock de la distribución PDFlib

<?php
$radius
= 200;
$margin = 20;
$pagecount = 10;

$pdf = pdf_new();

if (!
pdf_open_file($pdf, "")) {
    echo
error;
    exit;
};

pdf_set_parameter($pdf, "warning", "true");

pdf_set_info($pdf, "Creator", "pdf_clock.php");
pdf_set_info($pdf, "Author", "Uwe Steinmann");
pdf_set_info($pdf, "Title", "Analog Clock");

while (
$pagecount-- > 0) {
   
pdf_begin_page($pdf, 2 * ($radius + $margin), 2 * ($radius + $margin));

   
pdf_set_parameter($pdf, "transition", "wipe");
   
pdf_set_value($pdf, "duration", 0.5);
 
   
pdf_translate($pdf, $radius + $margin, $radius + $margin);
   
pdf_save($pdf);
   
pdf_setrgbcolor($pdf, 0.0, 0.0, 1.0);

   
/* minute strokes */
   
pdf_setlinewidth($pdf, 2.0);
    for (
$alpha = 0; $alpha < 360; $alpha += 6) {
       
pdf_rotate($pdf, 6.0);
       
pdf_moveto($pdf, $radius, 0.0);
       
pdf_lineto($pdf, $radius-$margin/3, 0.0);
       
pdf_stroke($pdf);
    }

   
pdf_restore($pdf);
   
pdf_save($pdf);

   
/* 5 minute strokes */
   
pdf_setlinewidth($pdf, 3.0);
    for (
$alpha = 0; $alpha < 360; $alpha += 30) {
       
pdf_rotate($pdf, 30.0);
       
pdf_moveto($pdf, $radius, 0.0);
       
pdf_lineto($pdf, $radius-$margin, 0.0);
       
pdf_stroke($pdf);
    }

   
$ltime = getdate();

   
/* draw hour hand */
   
pdf_save($pdf);
   
pdf_rotate($pdf,-(($ltime['minutes']/60.0)+$ltime['hours']-3.0)*30.0);
   
pdf_moveto($pdf, -$radius/10, -$radius/20);
   
pdf_lineto($pdf, $radius/2, 0.0);
   
pdf_lineto($pdf, -$radius/10, $radius/20);
   
pdf_closepath($pdf);
   
pdf_fill($pdf);
   
pdf_restore($pdf);

   
/* draw minute hand */
   
pdf_save($pdf);
   
pdf_rotate($pdf,-(($ltime['seconds']/60.0)+$ltime['minutes']-15.0)*6.0);
   
pdf_moveto($pdf, -$radius/10, -$radius/20);
   
pdf_lineto($pdf, $radius * 0.8, 0.0);
   
pdf_lineto($pdf, -$radius/10, $radius/20);
   
pdf_closepath($pdf);
   
pdf_fill($pdf);
   
pdf_restore($pdf);

   
/* draw second hand */
   
pdf_setrgbcolor($pdf, 1.0, 0.0, 0.0);
   
pdf_setlinewidth($pdf, 2);
   
pdf_save($pdf);
   
pdf_rotate($pdf, -(($ltime['seconds'] - 15.0) * 6.0));
   
pdf_moveto($pdf, -$radius/5, 0.0);
   
pdf_lineto($pdf, $radius, 0.0);
   
pdf_stroke($pdf);
   
pdf_restore($pdf);

   
/* draw little circle at center */
   
pdf_circle($pdf, 0, 0, $radius/30);
   
pdf_fill($pdf);

   
pdf_restore($pdf);

   
pdf_end_page($pdf);

   
# to see some difference
   
sleep(1);
}

pdf_close($pdf);

$buf = pdf_get_buffer($pdf);
$len = strlen($buf);

header("Content-type: application/pdf");
header("Content-Length: $len");
header("Content-Disposition: inline; filename=foo.pdf");
echo
$buf;

pdf_delete($pdf);
?>

Ver también

Nota: Una alternativa de módulo PHP para la creación de documentos PDF basados en FastIO's ClibPDF está disponible. Mirar la sección ClibPDF para más detalles. Tener en cuenta que ClibPDF tiene alguna diferencia con PDFlib.

Tabla de contenidos
pdf_activate_item -- Activate structure element or other content item
pdf_add_annotation -- Adds annotation
pdf_add_bookmark2 -- Add bookmark for current page [deprecated]
pdf_add_bookmark -- Add bookmark for current page [deprecated]
pdf_add_launchlink -- Add launch annotation for current page [deprecated]
pdf_add_locallink -- Add link annotation for current page [deprecated]
pdf_add_nameddest -- Create named destination
pdf_add_note2 -- Set annotation for current page [deprecated]
pdf_add_note -- Set annotation for current page [deprecated]
PDF_add_outline -- Adds bookmark for current page
pdf_add_pdflink -- Add file link annotation for current page [deprecated]
pdf_add_thumbnail -- Add thumbnail for current page
pdf_add_weblink -- Add weblink for current page [deprecated]
PDF_arc -- Draws an arc
pdf_arcn -- Draw a clockwise circular arc segment
pdf_attach_file2 -- Add file attachment for current page [deprecated]
pdf_attach_file -- Add file attachment for current page [deprecated]
pdf_begin_document -- Create new PDF file
pdf_begin_font -- Start a Type 3 font definition
pdf_begin_glyph -- Start glyph definition for Type 3 font
pdf_begin_item -- Open structure element or other content item
pdf_begin_layer -- Start layer
pdf_begin_page_ext -- Start new page
PDF_begin_page -- Starts new page
pdf_begin_pattern -- Start pattern definition
pdf_begin_template -- Start template definition
PDF_circle -- Draws a circle
PDF_clip -- Clips to current path
PDF_close_image -- Closes an image
pdf_close_pdi_page --  Close the page handle
pdf_close_pdi --  Close the input PDF document
PDF_close -- Closes a pdf document
PDF_closepath_fill_stroke -- Closes, fills and strokes current path
PDF_closepath_stroke -- Closes path and draws line along path
PDF_closepath -- Closes path
pdf_concat -- Concatenate a matrix to the CTM
PDF_continue_text -- Outputs text in next line
pdf_create_action -- Create action for objects or events
pdf_create_annotation -- Create rectangular annotation
pdf_create_bookmark -- Create bookmark
pdf_create_field -- Create form field
pdf_create_fieldgroup -- Create form field group
pdf_create_gstate -- Create graphics state object
pdf_create_pvf -- Create PDFlib virtual file
pdf_create_textflow -- Create textflow object
PDF_curveto -- Draws a curve
pdf_define_layer -- Create layer definition
pdf_delete_pvf -- Delete PDFlib virtual file
pdf_delete_textflow -- Delete textflow object
pdf_delete -- Delete PDFlib object
pdf_encoding_set_char -- Add glyph name and/or Unicode value
pdf_end_document -- Close PDF file
pdf_end_font -- Terminate Type 3 font definition
pdf_end_glyph -- Terminate glyph definition for Type 3 font
pdf_end_item -- Close structure element or other content item
pdf_end_layer -- Deactivate all active layers
pdf_end_page_ext -- Finish page
PDF_end_page -- Ends a page
pdf_end_pattern -- Finish pattern
pdf_end_template -- Finish template
PDF_endpath -- Ends current path
pdf_fill_imageblock -- Fill image block with variable data
pdf_fill_pdfblock -- Fill image block with variable data
PDF_fill_stroke -- Fills and strokes current path
pdf_fill_textblock -- Fill text block with variable data
PDF_fill -- Fills current path
pdf_findfont -- Prepare font for later use [deprecated]
pdf_fit_image -- Place image or template
pdf_fit_pdi_page -- Place imported PDF page
pdf_fit_textflow -- Format textflow in rectangular area
pdf_fit_textline -- Place single line of text
pdf_get_apiname -- Get name of unsuccessfull API function
pdf_get_buffer -- Get PDF output buffer
pdf_get_errmsg -- Get error text
pdf_get_errnum -- Get error number
pdf_get_font -- Get font [deprecated]
pdf_get_fontname -- Get font name [deprecated]
pdf_get_fontsize -- Font handling [deprecated]
pdf_get_image_height -- Get image height [deprecated]
pdf_get_image_width -- Get image width [deprecated]
pdf_get_majorversion -- Get major version number [deprecated]
pdf_get_minorversion -- Get minor version number [deprecated]
PDF_get_parameter -- Gets certain parameters
pdf_get_pdi_parameter -- Get PDI string parameter
pdf_get_pdi_value -- Get PDI numerical parameter
PDF_get_value -- Gets certain numerical value
pdf_info_textflow -- Query textflow state
pdf_initgraphics -- Reset graphic state
PDF_lineto -- Draws a line
pdf_load_font -- Search and prepare font
pdf_load_iccprofile -- Search and prepare ICC profile
pdf_load_image -- Open image file
pdf_makespotcolor -- Make spot color
PDF_moveto -- Sets current point
pdf_new -- Create PDFlib object
pdf_open_ccitt -- Open raw CCITT image [deprecated]
pdf_open_file -- Create PDF file [deprecated]
PDF_open_gif -- Opens a GIF image
pdf_open_image_file -- Read image from file [deprecated]
pdf_open_image -- Use image data [deprecated]
PDF_open_jpeg -- Opens a JPEG image
PDF_open_memory_image -- Opens an image created with PHP's image functions
pdf_open_pdi_page --  Prepare a page
pdf_open_pdi -- Open PDF file
pdf_open_tiff -- Open TIFF image [deprecated]
PDF_place_image -- Places an image on the page
pdf_place_pdi_page -- Place PDF page [deprecated]
pdf_process_pdi -- Process imported PDF document
PDF_rect -- Draws a rectangle
PDF_restore -- Restores formerly saved environment
pdf_resume_page -- Resume page
PDF_rotate -- Sets rotation
PDF_save -- Saves the current environment
PDF_scale -- Sets scaling
PDF_set_border_color -- Sets color of border around links and annotations
PDF_set_border_dash -- Sets dash style of border around links and annotations
PDF_set_border_style -- Sets style of border around links and annotations
PDF_set_char_spacing -- Sets character spacing
PDF_set_duration -- Sets duration between pages
pdf_set_gstate -- Activate graphics state object
PDF_set_horiz_scaling -- Sets horizontal scaling of text
pdf_set_info_author --  Fill the author document info field [deprecated]
pdf_set_info_creator --  Fill the creator document info field [deprecated]
pdf_set_info_keywords --  Fill the keywords document info field [deprecated]
pdf_set_info_subject --  Fill the subject document info field [deprecated]
pdf_set_info_title --  Fill the title document info field [deprecated]
PDF_set_info -- Fills a field of the document information
pdf_set_layer_dependency -- Define relationships among layers
PDF_set_leading -- Sets distance between text lines
PDF_set_parameter -- Sets certain parameters
PDF_set_text_matrix -- Sets the text matrix
PDF_set_text_pos -- Sets text position
PDF_set_text_rendering -- Determines how text is rendered
PDF_set_text_rise -- Sets the text rise
PDF_set_value -- Sets certain numerical value
PDF_set_word_spacing -- Sets spacing between words
pdf_setcolor -- Set fill and stroke color
PDF_setdash -- Sets dash pattern
pdf_setdashpattern -- Set dash pattern
PDF_setflat -- Sets flatness
pdf_setfont -- Set font
PDF_setgray_fill -- Sets filling color to gray value
PDF_setgray_stroke -- Sets drawing color to gray value
PDF_setgray -- Sets drawing and filling color to gray value
PDF_setlinecap -- Sets linecap parameter
PDF_setlinejoin -- Sets linejoin parameter
PDF_setlinewidth -- Sets line width
pdf_setmatrix -- Set current transformation matrix
PDF_setmiterlimit -- Sets miter limit
pdf_setpolydash -- Set complicated dash pattern [deprecated]
PDF_setrgbcolor_fill -- Sets filling color to rgb color value
PDF_setrgbcolor_stroke -- Sets drawing color to rgb color value
PDF_setrgbcolor -- Sets drawing and filling color to rgb color value
pdf_shading_pattern -- Define shading pattern
pdf_shading -- Define blend
pdf_shfill -- Fill area with shading
PDF_show_boxed -- Output text in a box
PDF_show_xy -- Output text at given position
PDF_show -- Output text at current position
PDF_skew -- Skews the coordinate system
PDF_stringwidth -- Returns width of text using current font
PDF_stroke -- Draws line along path
pdf_suspend_page -- Suspend page
PDF_translate -- Sets origin of coordinate system
pdf_utf16_to_utf8 -- Convert string from UTF-16 to UTF-8
pdf_utf8_to_utf16 -- Convert string from UTF-8 to UTF-16
pdf_xshow -- Output text at current position


pdf_activate_item> <preg_split
[edit] Last updated: Mon, 01 Nov 2010
 
add a note add a note User Contributed Notes Funciones PDF
sander at alternet dot nl 23-Aug-2010 02:42
Took me some time to find how to add a centered aligned footer, here's how:

<?php
// place footer line, centered. 297.5 is exactly half the width of a A4 page
$p->fit_textline($textline, 297.5, 35, " position=center");
?>
Janvarev from GMail.com 08-Aug-2009 02:38
Hi,
there is some more fix from luc pdf2text function. It really works at my tasks.

Two fixes:
1) Different platforms set different characters after start "stream" text, for example: "stream\n", "stream\r", "stream\r\n". So, we detect it first.
2) Some non-text blocks are detected as text, so we added a function "FilterNonText".

<?php
function handleV2($data){

   
// try detecting \n, \r or \r\n variation
   
$tmp = strpos($data, "stream");
   
$end_stream_delimiter = substr($data, $tmp+6, 2);

    if(
$end_stream_delimiter != "\r\n") {
      
$end_stream_delimiter = substr($end_stream_delimiter, 0, 1);
    }
   
//echo bin2hex($end_stream_delimiter); // - debug information

    // grab objects and then grab their contents (chunks)
   
$a_obj = getDataArray($data,"obj","endobj");

    foreach(
$a_obj as $obj){

       
$a_filter = getDataArray($obj,"<<",">>");

        if (
is_array($a_filter)){
           
$j++;
           
$a_chunks[$j]["filter"] = $a_filter[0];

           
$a_data = getDataArray($obj,"stream".
$end_stream_delimiter,"endstream");
            if (
is_array($a_data)){
               
$a_chunks[$j]["data"] = substr($a_data[0],
       
strlen("stream".$end_stream_delimiter),
       
strlen($a_data[0])-
strlen("stream".$end_stream_delimiter)-strlen("endstream"));
            }
        }
    }

   
// decode the chunks
   
foreach($a_chunks as $chunk){

       
// look at each chunk and decide how to decode it - by looking at the contents of the filter
       
$a_filter = split("/",$chunk["filter"]);

        if (
$chunk["data"]!=""){
           
// look at the filter to find out which encoding has been used
           
if (substr($chunk["filter"],"FlateDecode")!==false){
               
$data =@ gzuncompress($chunk["data"]);
                if (
trim($data)!=""){
           
// CHANGED HERE, before: $result_data .= ps2txt($data);
                   
$result_data .= FilterNonText(PS2Text_New($data));
                } else {

                   
//$result_data .= "x";
               
}
            }
        }
    }
    return
$result_data;
}

function
FilterNonText($data) {
  for(
$i=1;$i<9;$i++) {
      if(
strpos($data, chr($i)) !== false) {
         return
""; // not text, something strange
     
}
  }
  return
$data;
}
?>

Warning: this is only a patch to "luc at phpt dot org" code. You must use his solution first, then replace function with this patch.
bolyde at gmail dot com 12-Apr-2009 12:27
Hi,
To find the page number of a PDF File, i find this :

<?php
public function getNumPagesInPDF(array $arguments = array())
{
@list(
$PDFPath) = $arguments;
$stream = @fopen($PDFPath, "r");
$PDFContent = @fread ($stream, filesize($PDFPath));
if(!
$stream || !$PDFContent)
    return
false;
   
$firstValue = 0;
$secondValue = 0;
if(
preg_match("/\/N\s+([0-9]+)/", $PDFContent, $matches)) {
   
$firstValue = $matches[1];
}
 
if(
preg_match_all("/\/Count\s+([0-9]+)/s", $PDFContent, $matches))
{
   
$secondValue = max($matches[1]);
}
return ((
$secondValue != 0) ? $secondValue : max($firstValue, $secondValue));
}
?>
bondo2 at bondo2 dot info 09-Oct-2008 02:20
<?php

//getting new instance
$pdfFile = new_pdf();

PDF_open_file($pdfFile, " ");

//document info
pdf_set_info($pdfFile, "Auther", "Ahmed Elbshry");
pdf_set_info($pdfFile, "Creator", "Ahmed Elbshry");
pdf_set_info($pdfFile, "Title", "PDFlib");
pdf_set_info($pdfFile, "Subject", "Using PDFlib");

//starting our page and define the width and highet of the document
pdf_begin_page($pdfFile, 595, 842);

//check if Arial font is found, or exit
if($font = PDF_findfont($pdfFile, "Arial", "winansi", 1)) {
   
PDF_setfont($pdfFile, $font, 12);
} else {
    echo (
"Font Not Found!");
   
PDF_end_page($pdfFile);
   
PDF_close($pdfFile);
   
PDF_delete($pdfFile);
    exit();
}

//start writing from the point 50,780
PDF_show_xy($pdfFile, "This Text In Arial Font", 50, 780);
PDF_end_page($pdfFile);
PDF_close($pdfFile);

//store the pdf document in $pdf
$pdf = PDF_get_buffer($pdfFile);
//get  the len to tell the browser about it
$pdflen = strlen($pdfFile);

//telling the browser about the pdf document
header("Content-type: application/pdf");
header("Content-length: $pdflen");
header("Content-Disposition: inline; filename=phpMade.pdf");
//output the document
print($pdf);
//delete the object
PDF_delete($pdfFile);
?>
SID TRIVEDI 20-Jan-2008 06:16
/*
Folks, There is an excellent tutorial from Rasmus Lerdorf available at (It does not support I.E.)

http://talks.php.net/show/osconpdf/

Where PHP Mastermind Guru (Father) explained nicely about text, fonts, images and their attributes with working snippets.

Another tutorial can be found at

www.devshed.com/c/a/PHP/Building-PDF-Documents-with-PHP-5

Hence following is the various size of PDF Document.

Origin is at the lower left and the basic unit is the DTP pt.

1 pt = 1/72 inch = 0.35277777778 mm

Some common page sizes

Format          Width   Height
US-Letter      612      792
US-Legal       612      1008
US-Ledger     1224     792
11x17           792      1224
A0                2380    3368
A1                1684    2380
A2                1190    1684
A3                842      1190
A4                595      842
A5                421      595
A6                297      421
B5                501      709

*/
info at tecnick dot com 10-Jan-2008 12:54
For those of us that do not want to pay for a commercial license to use PDFlib I suggest TCPDF:

http://tcpdf.sf.net

TCPDF is an Open Source PHP class for generating PDF files on-the-fly without requiring external extensions. This class is already adopted by a large number of php projects such as phpMyAdmin, Drupal, Joomla, Xoops, TCExam, etc. 

Starting from 2.1 version TCPDF supports UTF-8 Unicode and bidirectional languages such as Arabic and Hebrew.
Ken McColl 21-Nov-2007 05:06
To get this to work on Windows do not use escapeshellcmd()

From online help:
Following characters are preceded by a backslash: #&;`|*?~<>^()[]{}$\, \x0A and \xFF. ' and " are escaped only if they are not paired. In Windows, all these characters plus % are replaced by a space instead.

So you are probably passing duff paths to pdf2text.exe

Removing escapeshellcmd worked for me. Just make darned sure you are in control of what is being passed through to your system call.
kangaroo232002 at yahoo dot co dot uk 18-Nov-2007 12:25
To extend alex's example earlier, you can use a couple of switches inside the pdf doc to give you the total number of pages, without using any ext. I would have added the whole code, however the site keeps on saying "line is too long... yadayada".

Open the doc using fopen("$file", "rb"); (for reading)

Test the first approx 1000b for the following regex
<?php
if(preg_match("/\/N\s+([0-9]+)/", $contents, $found)) {
    return
$found[1];
}
?>

If that doesn't return anything, you have to read the rest of the file:

<?php

preg_match_all
("/\/Type\s*\/Pages\s*\/Kids\s+
\[.*?\]\s*\/Count\s+([0-9]+)/"
);

?>

This may return more than one, so look through for the highest value, which is the total number of pages in your doc.
Jonathon Hibbard 05-Nov-2007 03:37
The other issue with DOMpdf is that it has some pretty painful flaws.

You have to supply full paths to everything (images, includes, javascript files, etc).  And boy, do i mean everything.

Even then, it is not 100% sound.  If you have complex sites, it cannot handle it.  It instead breaks the design and only provides you with about a million broken images.

Don't get me wrong, it's GREAT for use with lower-end more simple sites, but if you have a site that say, has a javascript navigation, flash, and a bunch of container divs, it's really not going to do the job.

The above library seems to be the best fit, as about the only way to get high-end sites to work is just to manually write it out yourself using the functions above.

Sorry to bust anyone's bubble.  Good luck.
taufiq at simplybuzz dot com 23-Oct-2007 01:13
There is XPDF Win32 binary package at SourceForge for pdftotext purpose that works.

I've tried php codes below but didn't work.
praokean at yahoo dot com 22-Aug-2007 05:08
domPDF is not so great PDF creator becouse don't support foreign charachters.
Sam from dogmaConsult.de 15-Aug-2007 02:00
I seriously tried to get PDF parsing to work to use it in the indexing for fulltext search for a document management. But none of the pdf2text functions below worked for my test cases (among them an openoffice generated pdf file and a file generated by fpdf).

But I found a REALLY WORKING SOLUTION! On linux systems, install the XPDF package. It comes with a tool called pdftotext. Use php code similar to the following to get the text content of your pdf files:

<?php
    $file
= "test.pdf";
   
$outpath = preg_replace("/\.pdf$/", "", $file).".txt";
   
   
system("pdftotext ".escapeshellcmd($file), $ret);
    if (
$ret == 0)
    {
       
$value = file_get_contents($outpath);
       
unlink($outpath);
        print
$value;
    }
    if (
$ret == 127)
        print
"Could not find pdftotext tool.";
    if (
$ret == 1)
        print
"Could not find pdf file.";
?>

The solution works on all test cases and is much more powerful than any of the previous pure php functions posted here, although only available on linux.
tatlar at yahoo dot com 14-Aug-2007 04:49
http://www.digitaljunkies.ca/dompdf/index.php

PHP5 class that converts HTML to PDF. From the website:
"At its heart, dompdf is (mostly) CSS2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes."
david at metabin 19-Jul-2007 04:19
Easiest way to get the text of a pdf is to install xpdf (on redhat yum -y install xpdf)

then run xpdftotext your.pdf - which will then generate your.txt.
jkndrkn at gmail dot com 03-May-2007 10:51
For those of us that do not want to pay for a commercial license to use PDFlib in a closed-source project, there are at least two good alternatives: FPDF and TCPDF

http://www.fpdf.org/
PHP4 and PHP5 support

http://sourceforge.net/projects/pdf-php
PHP5 support only
luc at phpt dot org 29-Mar-2007 09:09
I am trying to extract the text from PDF files and use it to feed a search engine (Intranet tool). I tried several functions "PDF2TXT" posted below, but not they do not produce the expected result. At least, all words need to be separated by spaces (then used as keywords), and the "junk" codes removed (for example: binary data, pictures...). I start modifying the interesting function posted by Swen, and here is the my current version that starts to work quite well (with PDF version 1.2). Sorry for having a quite different style of programming. Luc

<?php
// Patch for pdf2txt() posted Sven Schuberth
// Add/replace following code (cannot post full program, size limitation)

// handles the verson 1.2
// New version of handleV2($data), only one line changed
function handleV2($data){
       
   
// grab objects and then grab their contents (chunks)
   
$a_obj = getDataArray($data,"obj","endobj");
   
    foreach(
$a_obj as $obj){
       
       
$a_filter = getDataArray($obj,"<<",">>");
   
        if (
is_array($a_filter)){
           
$j++;
           
$a_chunks[$j]["filter"] = $a_filter[0];

           
$a_data = getDataArray($obj,"stream\r\n","endstream");
            if (
is_array($a_data)){
               
$a_chunks[$j]["data"] = substr($a_data[0],
       
strlen("stream\r\n"),
       
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

   
// decode the chunks
   
foreach($a_chunks as $chunk){

       
// look at each chunk and decide how to decode it - by looking at the contents of the filter
       
$a_filter = split("/",$chunk["filter"]);
       
        if (
$chunk["data"]!=""){
           
// look at the filter to find out which encoding has been used           
           
if (substr($chunk["filter"],"FlateDecode")!==false){
               
$data =@ gzuncompress($chunk["data"]);
                if (
trim($data)!=""){
           
// CHANGED HERE, before: $result_data .= ps2txt($data);   
                   
$result_data .= PS2Text_New($data);
                } else {
               
                   
//$result_data .= "x";
               
}
            }
        }
    }
    return
$result_data;
}

// New function - Extract text from PS codes
function ExtractPSTextElement($SourceString)
{
$CurStartPos = 0;
while ((
$CurStartText = strpos($SourceString, '(', $CurStartPos)) !== FALSE)
    {
   
// New text element found
   
if ($CurStartText - $CurStartPos > 8) $Spacing = ' ';
    else    {
       
$SpacingSize = substr($SourceString, $CurStartPos, $CurStartText - $CurStartPos);
        if (
$SpacingSize < -25) $Spacing = ' '; else $Spacing = '';
        }
   
$CurStartText++;

   
$StartSearchEnd = $CurStartText;
    while ((
$CurStartPos = strpos($SourceString, ')', $StartSearchEnd)) !== FALSE)
        {
        if (
substr($SourceString, $CurStartPos - 1, 1) != '\\') break;
       
$StartSearchEnd = $CurStartPos + 1;
        }
    if (
$CurStartPos === FALSE) break; // something wrong happened
   
    // Remove ending '-'
   
if (substr($Result, -1, 1) == '-')
        {
       
$Spacing = '';
       
$Result = substr($Result, 0, -1);
        }

   
// Add to result
   
$Result .= $Spacing . substr($SourceString, $CurStartText, $CurStartPos - $CurStartText);
   
$CurStartPos++;
    }
// Add line breaks (otherwise, result is one big line...)
return $Result . "\n";
}

// Global table for codes replacement
$TCodeReplace = array ('\(' => '(', '\)' => ')');

// New function, replacing old "pd2txt" function
function PS2Text_New($PS_Data)
{
global
$TCodeReplace;

// Catch up some codes
if (ord($PS_Data[0]) < 10) return '';
if (
substr($PS_Data, 0, 8) == '/CIDInit') return '';

// Some text inside (...) can be found outside the [...] sets, then ignored
// => disable the processing of [...] is the easiest solution

$Result = ExtractPSTextElement($PS_Data);

// echo "Code=$PS_Data\nRES=$Result\n\n";

// Remove/translate some codes
return strtr($Result, $TCodeReplace);
}

?>
Sven.Schuberth(at)gmx.de 28-Mar-2007 10:38
I've improved the codesnipped for the pdf2txt version 1.2.
Now its possible the translate pdf version >1.2 into plain text.

Sven

<?php
// Function    : pdf2txt()
// Arguments   : $filename - Filename of the PDF you want to extract
// Description : Reads a pdf file, extracts data streams, and manages
//               their translation to plain text - returning the plain
//               text at the end
// Authors      : Jonathan Beckett, 2005-05-02
//                            : Sven Schuberth, 2007-03-29

function pdf2txt($filename){

   
$data = getFileData($filename);
   
   
$s=strpos($data,"%")+1;
   
   
$version=substr($data,$s,strpos($data,"%",$s)-1);
    if(
substr_count($version,"PDF-1.2")==0)
        return
handleV3($data);
    else
        return
handleV2($data);

   
}
// handles the verson 1.2
function handleV2($data){
       
   
// grab objects and then grab their contents (chunks)
   
$a_obj = getDataArray($data,"obj","endobj");
   
    foreach(
$a_obj as $obj){
       
       
$a_filter = getDataArray($obj,"<<",">>");
   
        if (
is_array($a_filter)){
           
$j++;
           
$a_chunks[$j]["filter"] = $a_filter[0];

           
$a_data = getDataArray($obj,"stream\r\n","endstream");
            if (
is_array($a_data)){
               
$a_chunks[$j]["data"] = substr($a_data[0],
strlen("stream\r\n"),
strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
            }
        }
    }

   
// decode the chunks
   
foreach($a_chunks as $chunk){

       
// look at each chunk and decide how to decode it - by looking at the contents of the filter
       
$a_filter = split("/",$chunk["filter"]);
       
        if (
$chunk["data"]!=""){
           
// look at the filter to find out which encoding has been used           
           
if (substr($chunk["filter"],"FlateDecode")!==false){
               
$data =@ gzuncompress($chunk["data"]);
                if (
trim($data)!=""){
                   
$result_data .= ps2txt($data);
                } else {
               
                   
//$result_data .= "x";
               
}
            }
        }
    }
   
    return
$result_data;
}

//handles versions >1.2
function handleV3($data){
   
// grab objects and then grab their contents (chunks)
   
$a_obj = getDataArray($data,"obj","endobj");
   
$result_data="";
    foreach(
$a_obj as $obj){
       
//check if it a string
       
if(substr_count($obj,"/GS1")>0){
           
//the strings are between ( and )
           
preg_match_all("|\((.*?)\)|",$obj,$field,PREG_SET_ORDER);
            if(
is_array($field))
                foreach(
$field as $data)
                   
$result_data.=$data[1];
        }
    }
    return
$result_data;
}

function
ps2txt($ps_data){
   
$result = "";
   
$a_data = getDataArray($ps_data,"[","]");
    if (
is_array($a_data)){
        foreach (
$a_data as $ps_text){
           
$a_text = getDataArray($ps_text,"(",")");
            if (
is_array($a_text)){
                foreach (
$a_text as $text){
                   
$result .= substr($text,1,strlen($text)-2);
                }
            }
        }
    } else {
       
// the data may just be in raw format (outside of [] tags)
       
$a_text = getDataArray($ps_data,"(",")");
        if (
is_array($a_text)){
            foreach (
$a_text as $text){
               
$result .= substr($text,1,strlen($text)-2);
            }
        }
    }
    return
$result;
}

function
getFileData($filename){
   
$handle = fopen($filename,"rb");
   
$data = fread($handle, filesize($filename));
   
fclose($handle);
    return
$data;
}

function
getDataArray($data,$start_word,$end_word){

   
$start = 0;
   
$end = 0;
    unset(
$a_result);
   
    while (
$start!==false && $end!==false){
       
$start = strpos($data,$start_word,$end);
        if (
$start!==false){
           
$end = strpos($data,$end_word,$start);
            if (
$end!==false){
               
// data is between start and end
               
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
            }
        }
    }
    return
$a_result;
}
?>
brendandonhue at comcast dot net 22-Aug-2006 08:35
Here is a function to test whether a file is a PDF without using any external library.
<?php
define
('PDF_MAGIC', "\\x25\\x50\\x44\\x46\\x2D");
function
is_pdf($filename) {
  return (
file_get_contents($filename, false, null, 0, strlen(PDF_MAGIC)) === PDF_MAGIC) ? true : false;
}
?>
It's not checking if the whole file is valid, just if the correct header is present at the beginning of the file.
MAGnUm at magnumhome dot servehttp.com 17-Jul-2006 02:01
domPDF is also a great PDF creation interface. it basically converts your code to CSS and then builds the PDF from that with the absolute positions, and what not...
spingary at yahoo dot com 12-Jan-2006 12:55
I was having trouble with streaming inline PDf's using PHP 5.0.2, Apache 2.0.54.

This is my code:

<?
header
("Pragma: public");
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
header("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header("Cache-Control: must-revalidate");
header("Content-type: application/pdf");
header("Content-Length: ".filesize($file));
header("Content-disposition: inline; filename=$file");
header("Accept-Ranges: ".filesize($file));
readfile($file);
exit();
?>
It would work fine in Mozilla Firefox (1.0.7) but with IE (6.0.2800.1106) it would not bring up the Adobe Reader plugin and instead ask me to save it or open it as a PHP file.

Oddly enough, I turned off ZLib.compression and it started working.  I guess the compression is confusing IE.  I tried leaving out the content-length header thinking maybe it was unmatched filesize (uncompressed number vs actual received compressed size), but then without it it screws up Firefox too. 

What I ended up doing was disabling Zlib compression for the PDF output pages using ini_set:

<?
ini_set
('zlib.output_compression','Off');
?>

Maybe this will help someone. Will post over in the PDF section as well.
ontwerp AT zonnet.nl 03-Nov-2005 11:01
I was searching for a lowcost/opensource option for combining static html files [as templates] and dynamic output from perl or php routines etc. And the sooner or later I found out that this was the most stable, 'speedest' and customizeable way to produce usable pdf 's with nice formatting :

1] create html page output [perl-> html output, direct html output from any app or php echo's etc. [sort these html files locally]

2] parse all html [inluding webimages links, tables font formatting etc] to [E]PS files with the perl app : html2ps [as mentioned beneath]
http://user.it.uu.se/~jan/html2ps.html [sort all ps files by future pdf page positions]

3] use the free ps2pdf/ps2pdfwr linux application
http://www.ps2pdf.com/convert/index.htm [uses gostscript, ghostview libs and so on etc]
Has great formatting options like headers, footers, numbering etc
[sort pdf files]

4] convert all pdf files to 1 pdf file with : pdftk [pdftoolkit], deliveres optional compressions/encryption, background stamps etc

One should ask why using different scripts :
- combination perl/php is great : perl is speedier at some issues like conversion to ps files in my experience
- ps to pdf is quickier then direct php to pdf [in my exp.!]
- I have total control over every files whenever i change html files as a template I use only editors or other app. for it [online or offline].

p.s. I had to make a opensource solution for creating simpel report analyses that's based on things like :
- first page [name / title / #/ date]
- some static info [like introduction, copyrights etc]
- some dynamic info [outputted from php->dbase queries] combined
with html tags/images etc.

And this all mixed [so seperated in files for transparancy]. Also the 3 way manner : data-> html, html->ps, ps->pdf, is easier and quickier to program or adjust in every step.

Correct me if i'm wrong [mail me to]

ing. Valentijn Langendorff
Design & Technologist
ragnar at deulos dot com 07-Oct-2005 07:30
After one hole day understanding how pdflib works i got the conclusion that its enough hard to draw just with words to furthermore for drawing a line maybe you will need something like four lines of code, so i did my own functions to do the life easier and the code more understable to modify and draw. I also made a function that will draw a rect with the corners round and the posibility even to fill it ;)

You can get it from http://www.deulos.com/pdf_php.php

feel free to make suggestions or whatever u like ;o)
17-Sep-2005 11:26
some code that can be very helpful for starters.

<?php

   
// Declare PDF File

   
$pdf = pdf_new();
   
PDF_open_file($pdf);

   
// Set Document Properties

   
PDF_set_info($pdf, "author", "Alexander Pas");
   
PDF_set_info($pdf, "title", "PDF by PHP Example");
   
PDF_set_info($pdf, "creator", "Alexander Pas");
   
PDF_set_info($pdf, "subject", "Testing Code");

   
// Get fonts to use

   
pdf_set_parameter($pdf, "FontOutline", "Arial=arial.ttf"); // get a custom font
   
$font1 = PDF_findfont($pdf, "Helvetica-Bold""winansi", 0); // declare default font
   
$font2 = PDF_findfont($pdf, "Arial""winansi", 1); // declare custom font & embed into file

    /*
    You can use the following Fontypes 14 safely (the default fonts)
    Courier, Courier-Bold, Courier-Oblique, Courier-BoldOblique
    Helvetica, Helvetica-Bold, Helvetica-Oblique, Helvetica-BoldOblique
    Times-Roman, Times-Bold, Times-Italic, Times-BoldItalic
    Symbol, ZapfDingbats
    */

    // make the images

   
$image1 = PDF_open_image_file($pdf, "gif", "image.gif"); //supported filetypes are: jpeg, tiff, gif, png.

    //Make First Page

   
PDF_begin_page($pdf, 450, 450); // page width and height.
   
$bookmark = PDF_add_bookmark($pdf, "Front"); // add a top level bookmark.
   
PDF_setfont($pdf, $font1, 12); // use this font from now on.
   
PDF_show_xy($pdf, "First Page!", 5, 225); // show this text measured from the left top.
   
pdf_place_image($pdf, $image1, 255, 5, 1); // last number will schale it.
   
PDF_end_page($pdf); // End of Page.

    //Make Second Page

   
PDF_begin_page($pdf, 450, 225); // page width and height.
   
$bookmark1 = PDF_add_bookmark($pdf, "Chapter1", $bookmark); // add a nested bookmark. (can be nested multiple times.)
   
PDF_setfont($pdf, $font2, 12); // use this font from now on.
   
PDF_show_xy($pdf, "Chapter1!", 225, 5);
   
PDF_add_bookmark($pdf, "Chapter1.1", $bookmark1); // add a nested bookmark (already in a nested one).
   
PDF_setfont($pdf, $font1, 12);
   
PDF_show_xy($pdf, "Chapter1.1", 225, 5);
   
PDF_end_page($pdf);
   
   
// Finish the PDF File
   
   
PDF_close($pdf); // End Of PDF-File.
   
$output = PDF_get_buffer($pdf); // assemble the file in a variable.

    // Output Area

   
header("Content-type: application/pdf"); //set filetype to pdf.
   
header("Content-Length: ".strlen($output)); //content length
   
header("Content-Disposition: attachment; filename=test.pdf"); // you can use inline or attachment.
   
echo $output; // actual print area!

    // Cleanup

   
PDF_delete($pdf);
?>
thodge at ipswich dot qld dot gov dot au 04-Sep-2005 10:22
Yet another addition to the PDF text extraction code last posted by jorromer. The code only seemed to work for PDF 1.2 (Acrobat 3.x) or below. This pdfExtractText function uses regular expressions to cover cases I have found in PDF 1.3 and 1.4 documents. The code also handles closing brackets in the text stream, which were ignored by the previous version. My regular expression skills are somewhat lacking, so improvements may possible by a more skilled programmer. I'm sure there are still cases that this function will not handle, but I haven't come across any yet...

<?php

function pdf2string($sourcefile) {

   
$fp = fopen($sourcefile, 'rb');
   
$content = fread($fp, filesize($sourcefile));
   
fclose($fp);

   
$searchstart = 'stream';
   
$searchend = 'endstream';
   
$pdfText = '';
   
$pos = 0;
   
$pos2 = 0;
   
$startpos = 0;

    while (
$pos !== false && $pos2 !== false) {

       
$pos = strpos($content, $searchstart, $startpos);
       
$pos2 = strpos($content, $searchend, $startpos + 1);

        if (
$pos !== false && $pos2 !== false){

            if (
$content[$pos] == 0x0d && $content[$pos + 1] == 0x0a) {
               
$pos += 2;
            } else if (
$content[$pos] == 0x0a) {
               
$pos++;
            }

            if (
$content[$pos2 - 2] == 0x0d && $content[$pos2 - 1] == 0x0a) {
               
$pos2 -= 2;
            } else if (
$content[$pos2 - 1] == 0x0a) {
               
$pos2--;
            }

           
$textsection = substr(
               
$content,
               
$pos + strlen($searchstart) + 2,
               
$pos2 - $pos - strlen($searchstart) - 1
           
);
           
$data = @gzuncompress($textsection);
           
$pdfText .= pdfExtractText($data);
           
$startpos = $pos2 + strlen($searchend) - 1;

        }
    }

    return
preg_replace('/(\s)+/', ' ', $pdfText);

}

function
pdfExtractText($psData){

    if (!
is_string($psData)) {
        return
'';
    }

   
$text = '';

   
// Handle brackets in the text stream that could be mistaken for
    // the end of a text field. I'm sure you can do this as part of the
    // regular expression, but my skills aren't good enough yet.
   
$psData = str_replace('\)', '##ENDBRACKET##', $psData);
   
$psData = str_replace('\]', '##ENDSBRACKET##', $psData);

   
preg_match_all(
       
'/(T[wdcm*])[\s]*(\[([^\]]*)\]|\(([^\)]*)\))[\s]*Tj/si',
       
$psData,
       
$matches
   
);
    for (
$i = 0; $i < sizeof($matches[0]); $i++) {
        if (
$matches[3][$i] != '') {
           
// Run another match over the contents.
           
preg_match_all('/\(([^)]*)\)/si', $matches[3][$i], $subMatches);
            foreach (
$subMatches[1] as $subMatch) {
               
$text .= $subMatch;
            }
        } else if (
$matches[4][$i] != '') {
           
$text .= ($matches[1][$i] == 'Tc' ? ' ' : '') . $matches[4][$i];
        }
    }

   
// Translate special characters and put back brackets.
   
$trans = array(
       
'...'                => '…',
       
'\205'                => '…',
       
'\221'                => chr(145),
       
'\222'                => chr(146),
       
'\223'                => chr(147),
       
'\224'                => chr(148),
       
'\226'                => '-',
       
'\267'                => '•',
       
'\('                => '(',
       
'\['                => '[',
       
'##ENDBRACKET##'    => ')',
       
'##ENDSBRACKET##'    => ']',
       
chr(133)            => '-',
       
chr(141)            => chr(147),
       
chr(142)            => chr(148),
       
chr(143)            => chr(145),
       
chr(144)            => chr(146),
    );
   
$text = strtr($text, $trans);

    return
$text;

}

?>
28-Aug-2005 09:58
If you want to display the number of pages (for example: page 1 of 3) then the following code could be helpful:

<?php
...

$pdf->begin_page_ext(842,595 , "");
  ..
add text,images,...
$pdf->suspend_page("");

$pdf->begin_page_ext(842,595 , "");
  ..
add text,images,...
$pdf->suspend_page("");

...
create all pages

$pdf
->resume_page("pagenumber 1");
...
add number of pages to page 1
$pdf
->end_page_ext("");

$pdf->resume_page("pagenumber 2");
...
add number of pages to page 2
$pdf
->end_page_ext("");

...
?>
jorromer at uchile dot cl -- Krash 07-Jun-2005 10:51
I recently use mattb code below for the extraction of text from PDF files. I modify this code for only extract text fields.

Hope i can help some one

Here is the Function

<?php

  $text
= pdf2string("file.pdf");
  echo
$text;

  function
pdf2string($sourcefile){
   
$fp = fopen($sourcefile, 'rb');
   
$content = fread($fp, filesize($sourcefile));
   
fclose($fp);

   
$searchstart = 'stream';
   
$searchend = 'endstream';
   
$pdfdocument = '';
   
$pos = 0;
   
$pos2 = 0;
   
$startpos = 0;
  
    while(
$pos !== false && $pos2 !== false ){
     
$pos = strpos($content, $searchstart, $startpos);
     
$pos2 = strpos($content, $searchend, $startpos + 1);
    
      if (
$pos !== false && $pos2 !== false){
        if (
$content[$pos]==0x0d && $content[$pos+1]==0x0a) $pos+=2;
        else if (
$content[$pos]==0x0a) $pos++;

        if (
$content[$pos2-2]==0x0d && $content[$pos2-1]==0x0a) $pos2-=2;
        else if (
$content[$pos2-1]==0x0a) $pos2--;

       
$textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);
       
$data = @gzuncompress($textsection);
       
$data = ExtractText2($data);
       
$startpos = $pos2 + strlen($searchend) - 1;
       
        if (
$data === false){
          return -
1;}
         
       
$pdfdocument .= $data;}}
   return
$pdfdocument;}

function
ExtractText2($postScriptData){
 
$sw = true;
 
$textStart = 0;
 
$len = strlen($postScriptData);

  while (
$sw){
   
$ini = strpos($postScriptData, '(', $textStart);
   
$end = strpos($postScriptData, ')', $textStart+1);
    if ((
$ini>0) && ($end>$ini)){
     
$valtext = strpos($postScriptData,'Tj',$end+1);
      if (
$valtext == $end + 2)
       
$text .= substr($postScriptData,$ini+1,$end - $ini - 1);}
     
   
$textStart = $end + 1;
    if (
$len<=$textStart) $sw=false;
   
    if ((
$ini == 0) && ($end == 0)) $sw=false;}
 
 
$trans = array("\\341" => "a","\\351" => "e","\\355" => "i","\\363" => "o","\\223" => "","\\224" => "");
 
$text  = strtr($text, $trans);
  return
$text;
}
?>
webadmin at secretscreen dot com 05-Apr-2005 02:51
I found this info about pdflib scope on a Chinese (I think) site and translated it.  I was trying to do pdf_setfont and kept getting the wrong scope error.  Turns out it has to be in the Page scope.  So pdf_setfont will only work when called between pdf_begin_page and pdf_end_page.

#########################################
When API of the PDFlib is called, the error, Can't - IN 'document' scope occurs
There is a concept of " the scope " in the PDFlib, as for all API of the PDFlib it is called with some scope, the *1 which is decided This error occurs when it is called other than the scope where API is appointed. The chart below in reference, please verify API call position.

Path: PDF_moveto (), PDF_circle (), PDF_arc (), PDF_arcn (), PDF_rect () in each case PDF_stroke (), PDF_closepath_stroke (), PDF_fill (), PDF_fill_stroke (), PDF_closepath_fill_stroke (), PDF_clip (), PDF_endpath () the between

Page: PDF_begin_page () with PDF_end_page () in between outside path 

Template: PDF_begin_template () with PDF_end_template () in between outside path 

Pattern: PDF_begin_pattern () with PDF_end_pattern () in between outside path 

Font: PDF_begin_font () with PDF_end_font () in between outside glyph 

Glyph: PDF_begin_glyph () with PDF_end_glyph () in between outside path 

Document: PDF_open_* () with PDF_close () in between outside page tempalte and pattern 

Object: The PDF_new () with the PDF_delete () it belongs to the other no scope in between the place

Null: Outside object 

Any: All scopes other than 

##########################################

Hope this helps others as much as it helped me!!!
chu61 dot tw at gmail dot com 06-Mar-2005 07:57
How to get how many pages in a PDF? I read PDF spec. V1.6 and find this:

PDF set  a "Page Tree Node" to define the ordering of pages in the document. The tree structure allows PDF applications, using little memory to quickly open a document containing thousands of pages.

If a PDF have 63 pages, the page tree node will like this...

2 0 obj
<< /Type /Pages
    /Kidsn [ 4 0 R
               10 0 R
             ]
     /Count 63        <---- YES, got it
>>
endobj

[P.S]   a  PDF may not only a pages tree node, The right answer is in "root page tree node", if  /Count XX with  /Parent XXX node, it not "root page tree node"

SO, You must find the node with /Count XX and Without /Parent  terms, and you'll get total pages of PDF

%PDF-1.0  ~  %PDF-1.5 all works

Alex form Taipei,Taiwan
mattb at bluewebstudios dot com 04-Feb-2005 01:44
I recently tested Donatas' code below for the extraction of text from PDF files.  After running into a few problems where PDF files were not being read at all, I've modified it somewhat.  It still isn't perfect, but should work great for searching.  Thanks Donatas.

<?php
$test
= pdf2string("<pathtoPDFfile>");
echo
"$test";

# Returns a -1 if uncompression failed
function pdf2string($sourcefile)
{
  
$fp = fopen($sourcefile, 'rb');
  
$content = fread($fp, filesize($sourcefile));
  
fclose($fp);

  
# Locate all text hidden within the stream and endstream tags
  
$searchstart = 'stream';
  
$searchend = 'endstream';
  
$pdfdocument = "";

  
$pos = 0;
  
$pos2 = 0;
  
$startpos = 0;
  
# Iterate through each stream block
  
while( $pos !== false && $pos2 !== false )
   {
     
# Grab beginning and end tag locations if they have not yet been parsed
     
$pos = strpos($content, $searchstart, $startpos);
     
$pos2 = strpos($content, $searchend, $startpos + 1);
      if(
$pos !== false && $pos2 !== false )
      {
        
# Extract compressed text from between stream tags and uncompress
        
$textsection = substr($content, $pos + strlen($searchstart) + 2, $pos2 - $pos - strlen($searchstart) - 1);
        
$data = @gzuncompress($textsection);
        
# Clean up text via a special function
        
$data = ExtractText($data);
        
# Increase our PDF pointer past the section we just read
        
$startpos = $pos2 + strlen($searchend) - 1;
         if(
$data === false ) { return -1; }
        
$pdfdocument = $pdfdocument . $data;
      }
   }

   return
$pdfdocument;
}

function
ExtractText($postScriptData)
{
   while( ((
$textStart = strpos($postScriptData, '(', $textStart)) && ($textEnd = strpos($postScriptData, ')', $textStart + 1)) && substr($postScriptData, $textEnd - 1) != '\\') )
   {
     
$plainText .= substr($postScriptData, $textStart + 1, $textEnd - $textStart - 1);
      if(
substr($postScriptData, $textEnd + 1, 1) == ']' ) // This adds quite some additional spaces between the words
     
{
        
$plainText .= ' ';
      }

     
$textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;
   }

   return
stripslashes($plainText);
}
?>
michi (Alt+Q) marel.at 01-Jul-2004 07:10
<?PHP
/* A little helpful function to calculate millimeters to points */
function calcToPt($intMillimeter) {
 
$intPoints = ($intMillimeter*72)/25.4;
 
$intPoints = round($intPoints);
  return
$intPoints;
}

/* For example: Create DIN A4 210x297 mm */
pdf_begin_page( $pdf, calcToPt(210), calcToPt(297)); // 595x842 pt
?>
donatas at spurgius dot com 22-Jun-2004 12:56
I've been looking for a way to extract plain text from PDF documents (needed to search for text inside 'em). Not being able to find one I wrote the needed functions myself. here you go folks.

<?php
 
function pdf2string ($sourceFile)
  {
   
$textArray = array ();
   
$objStart = 0;
   
   
$fp = fopen ($sourceFile, 'rb');
   
$content = fread ($fp, filesize ($sourceFile));
   
fclose ($fp);
   
   
$searchTagStart = chr(13).chr(10).'stream';
   
$searchTagStartLenght = strlen ($searchTagStart);
   
    while (((
$objStart = strpos ($content, $searchTagStart, $objStart)) && ($objEnd = strpos ($content, 'endstream', $objStart+1))))
    {
     
$data = substr ($content, $objStart + $searchTagStartLenght + 2, $objEnd - ($objStart + $searchTagStartLenght) - 2);
     
$data = @gzuncompress ($data);
     
      if (
$data !== FALSE && strpos ($data, 'BT') !== FALSE && strpos ($data, 'ET') !== FALSE)
      {
       
$textArray [] = ExtractText ($data);
      }
     
     
$objStart = $objStart < $objEnd ? $objEnd : $objStart + 1;
    }
   
    return
$textArray;
  }
 
  function
ExtractText ($postScriptData)
  {
    while (((
$textStart = strpos ($postScriptData, '(', $textStart)) && ($textEnd = strpos ($postScriptData, ')', $textStart + 1)) && substr ($postScriptData, $textEnd - 1) != '\\'))
    {
     
$plainText .= substr ($postScriptData, $textStart + 1, $textEnd - $textStart - 1);
      if (
substr ($postScriptData, $textEnd + 1, 1) == ']') //this adds quite some additional spaces between the words
     
{
       
$plainText .= ' ';
      }
     
     
$textStart = $textStart < $textEnd ? $textEnd : $textStart + 1;
    }
   
    return
stripslashes ($plainText);
  }
?>
uwe at steinmann dot cx 13-May-2004 06:25
Those looking for a free replacement of pdflib may consider
pslib at http://pslib.sourceforge.net which produces PostScript but it can be easily turned into PDF by Acrobat Distiller or ghostscript. The API is very similar and even hypertext functions are supported. There
is also a php extension for pslib in PECL, called ps.
samcontact at myteks dot com 01-May-2004 04:28
Here is another great tutorial on basic PDF building w/ PHP:
http://hotwired.lycos.com/webmonkey/02/20/index3a.html?tw=programming

=======================
http://myteks.com
Computer Repair & Web Design
=======================
SenorTZ senortz at nospam dot yahoo dot com 28-Jul-2003 06:23
About creating a PDF document based on the content of another document(let's say a text file):

I have tried to send to the PDF-creator page from a link from the sender page the file name of the file I want to read the content from and generate the PDF document containing this content. The idea is is that when I tried to reffer the pdf-creator page via the link your_root/create_pdf.php?filename=$your_file_name, the pdf-creator page does not behave well when before creating the pdf document I have a line like $filename = $_GET["filename"].
I solved this using on the sender page instead of the link a form with a button, so the form has as action "create_pdf.php", as method "post" and a hidden field containing the "filename" value. And it works like this if, on the pdf-creator page I have a line like $filename = $_POST["filename"].

I would like to understand why this way it works and the other way does not.

I hope this helps. Here are the pieces of code I used.

Sender page:
print("<form name='to_pdf' action='see_pdf_file.php' method='post'>");
print("<br/><input type='submit' value='PDF'><input type='hidden' name='filename' value='$filename'></form>");

PDF-creator page:
<?
$filename
= $_POST["filename"];
$file_handle = fopen($filename, "r");
$file_content = file_get_contents($filename);
fclose($file_handle);
//
$file_content = wordwrap($file_content,72,"|");
$a_row = explode("|",$file_content);
$i = 0;
//
$pdf = pdf_new();
pdf_open_file($pdf, "");
pdf_begin_page($pdf, 595, 842);
pdf_set_font($pdf, "Times-Roman", 16, "host");
pdf_add_outline($pdf, "Page 1");
pdf_set_value($pdf, "textrendering", 1);
pdf_show_xy($pdf, 'The content of the file:',50,700);
while (
$a_row[$i] != "")
{
      
pdf_continue_text($pdf,$a_row[$i]);
      
$i++;
}
pdf_end_page($pdf);
pdf_close($pdf);
//
$data = pdf_get_buffer($pdf);
//
header("Content-type: application/pdf");
header("Content-disposition: inline; filename=test.pdf");
header("Content-length: " . strlen($data));
//
echo $data;
?>

PDFLib and PHP 431 used.

Thanks.
bmironov at jonview dot com 24-Jun-2003 03:46
RedHat 9 + Apache 2.0 + PHP 4.3.2 + Oracle 9i + PDFlib 5.0.1 (binary distribution)

It seems to be a working bundle if you do some magic with ./configure:

RedHat 9:
kernel-2.4.20-18.9

Apache 2.0.46:
./configure --enable-so --enable-rewrite=shared --enable-status --enable-mpm=prefork

PHP 4.3.2:
./configure \
--program-prefix= \
--prefix=/usr \
--exec-prefix=/usr \
--bindir=/usr/bin \
--sbindir=/usr/sbin \
--sysconfdir=/etc \
--datadir=/usr/share \
--includedir=/usr/include \
--libdir=/usr/lib \
--libexecdir=/usr/libexec \
--localstatedir=/var \
--sharedstatedir=/usr/com \
--mandir=/usr/share/man \
--infodir=/usr/share/info \
--with-config-file-path=/etc \
--with-config-file-scan-dir=/etc/php.d \
--without-tsrm-pthreads \    # !!!!!!!!!!!!!!!!!!!!
--with-zlib \
--with-gd \
--enable-gd-native-ttf \
--with-ttf \
--without-mysql \
--with-apxs2filter=/usr/local/apache2/bin/apxs \
--with-oci8 \
--enable-sigchild \
--enable-inline-optimization

Oracle9i:
ln -s $ORACLE_HOME/rdbms/public/nzerror.h $ORACLE_HOME/rdbms/demo/nzerror.h

ln -s $ORACLE_HOME/rdbms/public/nzt.h $ORACLE_HOME/rdbms/demo/nzt.h

ln -s $ORACLE_HOME/rdbms/public/ociextp.h $ORACLE_HOME/rdbms/demo/ociextp.h

If you want to use bundled GD-library then:
1) install following packages: libjpeg, libjpeg-devel, libpng, libpng-devel, freetype, freetype-devel, libtiff, libtiff-devel, zlib, zlib-devel

2) ln -s /usr/lib/libjpeg.so.62 /usr/lib/libjpeg.so
ln -s /usr/lib/libpng.so.62 /usr/lib/libpng.so

It seems to be a working combination, because it is NOT give you:
1) error message in Apache's error_log:
Module compiled with module API=20020429, debug=0, thread-safety=0
PHP compiled with module API=20020429, debug=0, thread-safety=1

2) error message in Apache's error_log:
[notice] child pid 12345 exit signal Segmentation fault (11)

3) MS Internet Explorer can show PDF-output from your PHP-script via Acrobat plug-in and does not crush. No confusing messages about opening "Adobe Acrobat Control for ActiveX".

Hope it will save you some time.

Good luck,
Boris
pbierans at lynet dot de 27-Mar-2002 09:56
Load extension, open a PDF, add a font, modify PDF in memory and send
it to browser:

<?php
 
// no cache headers:
 
header("Expires: Mon, 26 Jul 1997 05:00:00 GMT");
 
header("Last-Modified: ".gmdate("D, d M Y H:i:s")." GMT");
 
header("Cache-Control: no-store, no-cache, must-revalidate");
 
header("Cache-Control: post-check=0, pre-check=0", false);
 
header("Pragma: no-cache");

 
$ext_name="libpdf_php.so";
   
// libpdf_php.so is the PDFLIB for SunOS by "PDFlib GmbH"
    // visit http://www.pdflib.com

  // if the extension is not automatically loaded by Apache
  // dl() will try to load it on demand:
 
if (!extension_loaded($ext_name) && !@dl($ext_name))
  {
   
?>
    <table width="100%" border="0"><tr><td align="center">
      <table style="border: solid #f0f0f0 2px;"><tr>
        <td valign="middle" style="padding: 20px; margin: 0px;">
          <p style="font-family: arial; font-size: 12px; ">
          <b>Sorry,</b><br>
          &nbsp;<br>
          A PDF can not be generated right now.<br>
          The administrator has been informed and will fix this as
          soon as possible.<br>
          Please try again later.
        </p>
      </td></tr></table>
    </td></tr></table>
    <?php
    mail
('admin@domain.com','Error: PDFLib not found',
        
'Called by script:\n  '.$SCRIPT_FILENAME.'?'.$QUERY_STRING,
        
"From: warnings@domain.com\n");
    exit;
  }
// verify that extension is usable

  // unique serial number:
 
srand(microtime()*10000);
 
$usnr= gmdate("Ymd-His-").rand(1000,9999).'-';
 
$pdf_file=$usnr.'result.pdf';
 
$src_file='source.pdf';

 
// create pdf object
 
$pdf = pdf_new();
 
pdf_open_file($pdf);
 
pdf_set_parameter($pdf, 'serial',      'if-you-have-one');

 
// fonts to embed, they are in the folder of this file:
 
pdf_set_parameter($pdf, 'FontAFM',     'TradeGothic=Tg______.afm');
 
pdf_set_parameter($pdf, 'FontOutline', 'TradeGothic=Tg______.pfb');
 
pdf_set_parameter($pdf, 'FontPFM',     'TradeGothic=Tg______.pfm');

 
// load the source file:
 
$src_doc   =pdf_open_pdi($pdf,$src_file,'', 0);
 
$src_page  =pdf_open_pdi_page($pdf,$src_doc,1,'');
 
$src_width =pdf_get_pdi_value($pdf,'width' ,$src_doc,$src_page,0);
 
$src_height=pdf_get_pdi_value($pdf,'height',$src_doc,$src_page,0);

 
pdf_begin_page($pdf, $src_width, $src_height);
  {
   
// place the sourcefile to the background of the actual page:
   
pdf_place_pdi_page($pdf,$src_page,0,0,1,1);
   
pdf_close_pdi_page($pdf,$src_page);

   
// modify the page:
   
pdf_set_font($pdf, 'TradeGothic', 8, 'host');
   
pdf_show_xy($pdf, 'Now: '.gmdate("Y-m-d H:i:s"),50,50);
  }
 
pdf_end_page($pdf);
 
pdf_close($pdf);

 
// prepare output:
 
$pdfdata = pdf_get_buffer($pdf); // to echo the pdf-data
 
$pdfsize = strlen($pdfdata);     // IE requires the datasize

  // real datatype headers:
 
header('Content-type: application/pdf');
 
header('Content-disposition: attachment; filename="'.$pdf_file.'"');
 
header('Content-length: '.$pdfsize);
  echo
$pdfdata;
  exit;
// keep this one so no #13#10 or #32 will be written
?>

 
show source | credits | sitemap | contact | advertising | mirror sites