KM-GEN is an unsupervised classifier for finding groups of similar or dissimilar images (anomalies) in large collections. It's used for classifying visual data e.g. travel VLOGs, security cam snaps/videos etc and for semi-automatic labelling of training datasets. Commonly occurring frames can be filtered out leaving only images which are relatively unique and thus may be of interest. A long video may be converted into a short time-lapse video of highlights or a large image collection can be condensed into a slide show of relatively unique images. KM-GEN can also automatically label large image repositories for machine learning with PyTorch etc.
Six images of two different type of flowers classified into two clusters using imgdist=3 for Hu's moment invariants with colour support.
python train-km-mp.py on 1 2
python predict-km.py i -1
Option -1 in predict-km.py allows selecting specific clusters for finding out the images present in each cluster.
Alternatively, any cluster can be inspected e.g. feh -f cluster_0.txt
A video summary of motion detection images collected with .
- The images should be of adequate resolution, e.g. 480 x 640 or above.
- The images have adequate features such as in street scenes, landscapes, objects, people etc. For instance trying to analyse tiny MNIST images or very dark scenes will not work as these are of extremely low resolution/contrast and thus not amenable for feature analysis. Feature analysis can however be replaced with full image analysis by enabling
imgfullandimg_bwoptions inconfig.py; reducingimghtcan benefit images with scant details. - The
imgfulloption should be enabled for low-light frames such as those captured in night vision mode of security cameras ornftsvalue can be lowered to double digits in this case. AlternativelyKM-MODmay be used which is designed specifically for security cameras.
The algorithm classifies images into clusters using KMeans. When the number of clusters is close to optimal, we will find clusters within 1st (25%) Quartile containing interesting images.
NB: train-km-mp.py option 0 enables Elbow Analysis, which is a good measure of finding the optimal number of cluster for the data set.
- RPI5 with 8GB is highly recommended however RPI4B with 4GB should be adequate in most cases.
- Python 3.7.3 or higher
sudo apt update
sudo apt upgrade
sudo apt install ffmpeg
python -m pip install -U pip
python -m pip install -U scikit-image
pip install opencv-python
pip install shutils
pip install -U scikit-learn
pip install matplotlib
pip install tqdm
ImgPathtomy_output_folder, or whatever you may have named it, needs to be edited inconfig.py. Other parameters can be left as is for time being.- Set the Path Variables at start of
moviefrm-list,moviefrm-list-ni, andutils/done-driver-mpbash scripts to the actual paths on your computer. NB The variableDVvalue inutils/daily-driver-mpandutils/date-driver-mpmust be exactly the same as inmoviefrm-list-niif using these scripts. Also the paths have to edited as above.
Clone this repository then extract frames from any MP4 movie clip (not included):
git clone https://github.com/SensorAnalyticsAus/KM-GEN.git
cd KM-GEN
./utils/fextract my_travel_vlog.mp4 my_output_folder 1
$ /path/to/.venv/bin/python train-km-mp.py on 1 10. Where on shows the progress bar, 1 to run in normal mode, and 10 is the number of clusters to use for training on the images, usually this a good number to start with e.g. youtube videos, however more precise value should be obtained by using option 0.
$ /path/to/.venv/bin/python predict-km.py ni 25. The predict module will run in non-interactive mode with ni option and gather up cluster of images less than or equal to 25 percentile.
Edit moviefrm-list shell script and change the following variables to your own values:
DIRP=/mnt/SSD
DV=YT
$ ./moviefrm-list 1 ffnames.txt. This will create a time-lapse video of the selected frames in Step 2 and display these at 1 frame/sec.
Invariant methods are not overly affected by the images being rotated.
Setting imgdist > 0 enables invariant pattern recognition methods such as ORB descriptors and Hu's moment invariants being used instead of keypoint features. Generally Euclidean distance is used however for ORB descriptors, an index frame is randomly chosen and the Hamming Distances of all other frames are calculated with reference to this frame.
The following imgdist values select different PR algorithms with the exception of imgfull option.
- 0: ORB keypoint features
- 1: ORB descriptors
- 2: Hu moment invariants on grayscale images
- 3: Hu moment invariants with RGB support
- 4: Colour histograms
- 5: Image's upper left corner's data from discrete Cosine Transformation
- 6: Eigen values of single objects against uniform background (as in Eigenfaces)
- 7: Image contours and entropy for motion-detection in security camera frames
img_bwflag for converting images to black and white is accepted forimgfullandimgdist = 0,1,2,3options. NB: Enablingimgfullover rides all the above options.
-
./utils/done-driver-mpaccepts-hto display usage information. This is a general purpose utility, which runs in batch mode with user specified parameters, to create a time-lapse video of all images in a folder. -
./utils/fextractaccepts-hto display usage information. This utility is for extracting images from videos. It provides optional parameters[skip_no_ts|simple_no_ts]for extracting frames without the default timestamps (in secs) by skipping non-key frames or using the defaultffmpegmode. -
./utils/save-kmusage: {filename}. Utility to save trained KMeans model for re-use intrain-km-mp.pyorpredict-km.py.ImgPathmust point to the same images folder with which the model was trained with. -
./utils/daily-driver-mpacceptson|offto display progress bar or run in silent mode (e.g. for use in cron). This utility is for security cam images with filenames in OCD3 or Foscam date-time format (e.g.img_20240515-223903_019269.jpg. It runs in batch mode collecting all images from time now till 12 hours in the past for a time-lapse summary of events. Recommendedimgdist=3. -
./utils/date-driver-mpaccepts-hto display usage information. This utility is also for security cam images. It converts images from user specified date-time range into a time-lapse video. Recommendedimgdist=3 -
ffnames2imagescopies images listed inffnames.txtor sayclustsOut/cluster_0.txtto a user specified destination folder e.g.clustsOut/rootforPyTorchtraining (images are copied toclustsOut/root/cluster_0/in this case). NB: For images with motion, say from videos and security cameras, black and white images of movement area contours can vastly reduce image sizes and improve learning, OCD3 automatically creates such images in itsimages_cn/folder.
- An incorrect path being set in
config.sysor the bash scripts. - Too few images being selected. Either
nftscan be progressively lowered towards a minimum of 3 orimgfullanalysis option may be invoked. - Images are in an unrecognised format, convert all such images to JPG.
- Images sizes differ.
- Not getting good clustering with
imgdist=0|1? Increasenfts. Note: increasingnftsdoes neither impactimgfull=1norimgdist > 1options.

