This Scan Client represents one part of a multipart Document Archive solution.
The Document Archive solution exists of
-
a Scan Client which is responsible for scanning documents, optimizing the scan quality, minimizing the scan files in file size, gathering meta data (e.g. list of keywords to characterize the content of document, document title, document date), creating an unique document id and transforms from single and multipage scanned documents PDF files, enriched with that meta data.
-
a Web Page Generator which generates out of the scanned documents and the gathered meta data a static HTML web page. All documents are organized in an index organized by two search criterias:
-
grouped by year of document date and then sorted by descending document date
-
grouped by one keyword of the document's keyword list and then sorted by descending document date
Each list entry is linked to one document, represented by it's PDF file.
That static HTML web page can either browsed locally, filebased with a web browser app (e.g. Google Chrome or Firefox) or hosted on a simple web server, which runs with Apache or NGINX or any other simalar web server.
For detailed information about the Web Page Generator, please refer to GitHup: Web Page Generator
-
Hint: The content of a document can be characterized not only by one keyword, but by many keywords, too.
The Scan Client uses heavily additional applications. Therefore these applications need to get installed as follows:
refer to FOSS Linux - Your complete guide to installing Python on Debian
root@debianvm:~# apt install python3 python3-pip
root@debianvm:~# python3 --version
Python 3.11.2
root@debianvm:~# pip3 --version
pip 23.0.1 from /usr/lib/python3/dist-packages/pip (python 3.11)
root@debianvm:~# git --version
git version 2.39.2
root@debianvm:~# apt update
root@debianvm:~# apt install rsync
root@debianvm:~# rsync --version
rsync version 3.2.7 protocol version 31
root@debianvm:~# wget --version
GNU Wget 1.21.3 übersetzt unter linux-gnu.
root@debianvm:~# curl --version
curl 7.88.1 (x86_64-pc-linux-gnu) ...
root@debianvm:~# apt install sane
root@debianvm:~# apt install libsane
Test of successful installation:
root@debianvm:~# scanimage --version
scanimage (sane-backends) 1.1.1-debian; backend version 1.1.1
refer to How to Install ImageMagick on Debian 12, 11 or 10
root@debianvm:~# apt install libpng-dev libjpeg-dev libtiff-dev
root@debianvm:~# apt install imagemagick
Test of successful installation:
root@debianvm:~# convert --version
Version: ImageMagick 6.9.11-60 Q16 x86_64 2021-01-25 https://imagemagick.org
...
root@debianvm:~# apt install dialog
Test of successful installation:
root@debianvm:~# dialog --version
Version: 1.3-20230209
For details on Installing the scan support for your scanner please refer to Scanner Installation and Configuration.
The minimal requirements for the Scan_Client application are:
-
Python is installed in Version 3.7 or higher.
In the following installation example I assume Python 3.8.
For Python download and installation instructions, please refer to https://wiki.python.org/moin/BeginnersGuide/Download.
-
PIP is available. If not, install it with the following command:
$> $ python -m ensurepip --upgradeFor further details, please refer to https://pip.pypa.io/en/stable/installation/
-
Virtual Environment is available. If not, install it with:
$> pip install --user virtualenv -
dialogis available. If not install it with the following command (example for Debian Linux):$> sudo apt update $> sudo apt install dialog $> dialog --versionYou should get an output similar to this
Version: 1.3-20160828Please refer to article on O'Reilly
For working in an isolated environment for the Scan_Client application, please
create a virtual environment somewhere in your $HOME directory.
$> virtualenv --python=python3.8 scan_client
Activate the environment:
$> cd scan_client
$> source bin/activate
Then check the availability of the right Python version:
$> python --version
Python 3.8.0
Then check if pip is available:
$> python -m ensurepip --upgrade
You should get an output similar to this:
Looking in links: /tmp/tmpok5o5sn_
Requirement already up-to-date: setuptools in ./lib/python3.8/site-packages (60.3.1)
Requirement already up-to-date: pip in ./lib/python3.8/site-packages (21.3.1)
Download the requirements.txt file from the GitHub project and place it in
the root directory of the freshly created virtual environment. Then install the
required Python libraries:
$> pip install -r requirements.txt
You should get an output similar to this
Collecting fpdf==1.7.2
Using cached fpdf-1.7.2-py2.py3-none-any.whl
Collecting numpy==1.19.0
Using cached numpy-1.19.0-cp38-cp38-manylinux2010_x86_64.whl (14.6 MB)
Collecting Pillow==7.2.0
Using cached Pillow-7.2.0-cp38-cp38-manylinux1_x86_64.whl (2.2 MB)
Collecting scipy==1.5.1
Using cached scipy-1.5.1-cp38-cp38-manylinux1_x86_64.whl (25.8 MB)
Installing collected packages: numpy, scipy, Pillow, fpdf
Successfully installed Pillow-7.2.0 fpdf-1.7.2 numpy-1.19.0 scipy-1.5.1
Now you can download the remaining files of the GitHub project into the virtual environment or you can just clone the current git archive.
Following files need to be changed with an text editor of your choice (e.g. vim):
-
scanapp.desktopEdit the following lines:
Exec=/path/to/scan_client/run.sh Icon=/path/to/scan_client/scan_app.png Path=/path/to/scan_client -
run.shEdit the following line:
PROJECT_DIR=/path/to/scan_client -
copy
config_handler_template.shtoconfig_handler.sh$> cp config_handler_template.sh config_handler.shand edit the following line:
SCAN_ARCHIVE_BASE_DIRECTORY="/path/to/document_archive"