DocMGR was written and tested under PHP 5.0.4 (Apache webserver 1.3.33) and PostgreSQL 8.0.1 on a Slackware box. As far as the OS goes, any linux or other unix-based OS should do nicely. Because of the javascript code used in DocMGR is fairly new, the client will need at least IE 5.5 SP2 or Netscape 7.0/Mozilla 1.0 for DocMGR to work correctly. On the server side, you will need at least Postgresql 7.4.0 and php 4.3.7 for DocMGR to work correctly.
DocMGR also requires some standard Unix programs for indexing files. These are as follows: tr, cat, ps, and which. If it cannot find these in apache's path, it will report an error when first accessing the program. These programs are pretty standard, so I don't anticipate any problems.
To install DocMGR:
Backup your database first!!!
To upgrade to 0.44-0.49.x to 0.53, you must upgrade to 0.50 first. First cd to the scripts/upgrade50 directory and follow the directions below. After completing these steps, run the upgrade.php file in the scripts/ directory to complete the upgrade to 0.51.
The latest version of DocMGR has several additional features which require outside software to be installed. The features and their required software are listed below. I apologize for the large amount of outside requirements, but this software is needed to allow DocMGR to function more as a complete Document Management System. DocMGR will automatically determine which programs you have installed and are available to apache, and will configure itself accordingly. Again, these additional features are optional and can be completely disabled in the config.php file
If you want the content of your images to be indexed so you can look up a scanned page (or whatever) by its content, then you will need this. This requires GOCR at http://jocr.sourceforge.net, ImageMagick http://www.imagemagick.org, and LibTiff http://www.remotesensing.org/libtiff/ to work properly. You probably already have imagemagick and libtiff installed on your system. Drop to a command prompt and type 'convert' for imagemagick, and 'tiffinfo' for libtiff, and see what you get. If you don't have them, you can download the packages from the url above. For DocMGR to enable OCR support, it needs to be able to find the gocr, mogrify, convert, tiffinfo, and tiffsplit binaries. So, they must be in apache's path.
This will allow your pdf file content to be indexed. I highly recommend this feature, especially if you use the PDF format for a majority of your scanned documentation. You have two choices here, xpdf and ghostscript. XPDF allows for faster indexing of PDFs. It will also autorotate your pdf pages for better OCRing. So, I definately recommend it over ghostscript. But, currently DocMGR does support either (for now). If you have both installed, it will favor xpdf over ghostscript. You can get XPDF from http://www.foolabs.com/xpdf/. XPDF requires version 3.0 or later. If you use ghostscript it must be version 6.52 or higher. If you do not have at least this version, you can download it from http://www.ghostscript.com
To index pdfs using ghostscript, just make sure the gs binary is in apache's path. To use xpdf, make sure the pdftotext, pdfimages, and pdftoppm binaries are in apache's path. They are probably in /usr/X11R6/bin by default. If you want to index encapsulated pdfs as well (like the one's from a copier), you'll need to follow the above steps for OCR support as well.
Note: Ghostscript requires zlib, libpng, and jpeg-6b to compile. You can download them from the "3rdparty" directory of the ghostscript ftp site. Just untar them in the top level directory of the ghostscript source, and rename the directories to zlib, libpng, and jpeg, respectively.
Like Image OCR, thumbnail support requires imagemagick and libtiff to work properly. See above for the packages' home pages. Text file thumbnails require enscript, which is probably already on your machine.
You can now email files to any email address. You can use this feature if sendmail is installed and running on your system, or if IMAP support is compiled into PHP.
If email support is enabled, you may also send files to non-docmgr users. These users are emailed a unique link and pin number which may be used to access the desired file. At the time of email sending, the sender designates the length of time the length is valid, and has the ability to be notified via email or SMS when the file is viewed by the recipient.
DocMGR can index any urls you link to. This will be enabled if you have "wget" installed on your system.
You can download a zipped version of any collection. The utility "zip" must be installed on your system.
DocMGR allows for you to assign up to six keywords to an uploaded file. You can configure the allowed keywords in the config/config.php file. Simply set the KEYWORD section in config/config.php. For example, if I wanted to allow to keyword fields named "Customer Number" and "Invoice Number", I would uncomment change the define("KEYWORDX") lines to read as follows:
Text fields with the names "Customer Number" and "Invoice Number" will appear in the Upload module and in the File Properties module, allowing the user to edit these values for the file. Remember, you can define up to 6 different keyword fields. In the Find module a separate area for searching by keyword will also appear.
To remove keyword searching, just comment out the above lines.
For those of you running larger DocMGR installations, you'll want to use tsearch2 for your document indexing. It results in much faster searches of your documents and result ranking. Tsearch2 installation instructions for DocMGR are listed below.
All of these above features are optional. But, I highly recommend them for a more efficient Document Management System.
If clamav is installed on your system, DocMGR will use it to scan files for viruses at upload, import, view, checkin/checkout, and email. If a virus is found, the virus will be reported, and the action will be cancelled. DocMGR looks for the "clamscan" binary to enable this feature. ClamAV may be downloaded from http://www.clamav.net.
DocMGR will convert the character encoding for certain file types to the encoding of your database before indexing. This allows for more accurate searches in non-english languages. DocMGR will use the iconv binary if available, followed by the php iconv function. If neither are found, no conversion will take place.
By default, DocMGR has MS Word indexing support. However, the results may not be always completely accurate. With the installation of antiword, the Word document is converted to text before being indexed, and a thumbnail of the document is created. You may download antiword from http://www.winfield.demon.nl.
Upon uploading/updating a file in the system, a md5 checksum of the file is created and saved in the database. At any type of file retrieval, the current checksum is verified against the stored value in the database before the file may be viewed. Also, a "Digital Signature" file (checksum.md5) may be sent with an email so the recipient may verify the file upon retrieval. The sending of the checksum.md5 attachment may be disabled in config.php. If the checksums do not match, the user will be notified and file viewing will not be allowed.
By setting this option in config.php, non-adminstrative users will not be able to remove files from the system. These users will still be able to upload/move files if their permissions are set accordingly.
This allows you to limit the number of past revisions to be kept for a file. This may be desired if disk space is limited on your server. In addition to this, you may set FILE_REVISION_REMOVE to allow a user to selectively remove past revisions of a file in the File's History module.
DocMGR now has the ability to support non-english languages. You may translate the lang/English.php file to the language of your choice, and drop your translation into the lang/ directory. DocMGR will find the new translation automatically and make it available to your users. See the Language page for more information.
Any language files you create may be submitted back to me to be posted on the DocMGR Language download page.
If you decide your document repository has outgrown DocMGR's simple indexing system, you may use tsearch2 for full text indexing. Your searches will be faster, and your search results will be ranked.
If you use tsearch2, I highly recommend you apply the regprocedure_update.sql patch available on the tsearch2 website. It allows for easier backup and restoration of your tsearch2-based database, instead of those complicated steps. I have tested the patch, and it works just fine with DocMGR.
To add tsearch2 to your database, first complete the DocMGR installation and/or upgrade steps above. Then perform the following steps:
These scripts are located in the DocMGR scripts/ directory. They may be used for maintenance, upgrading, or other specialized tasks. A brief description of the available scripts is below.
Recreates thumbnails for all supported file types in the system.
This file imports documents in a specified directory. It will delete the documents in that directory when the import is finished. This is intended to be run as a cron job. You may set the directory to import to and the user to import the documents as at the top of the file.
This file indexes pending documents in the background. Just set CRON_INDEX in the config.php file and add this file as a cron job.
For postgresql 8.0.2 and earlier. This sql scripts makes the DocMGR database tsearch2 ready.
For postgresql 8.0.3 and later. This sql scripts makes the DocMGR database tsearch2 ready.
The original docmgr database creation sql script.
This file updates a tsearch2 enabled docmgr to allow for easier backups. Should only be required for 7.4.x versions of postgresql.
Reindexes all documents. Some upgrades may require this script to be run. Or, you'll need to run this if you transfer to/from tsearch2.
Upgrades docmgr database from 0.50.x to 0.53
This file upgrades 0.44-0.49.x to 0.50. This must be run before running the 0.51 migration script if you are running 0.44-0.49.x. Run it from within the upgrade50/ directory.