ktoole

ktoole / FileBatch / 1.1.3

README.md

Overview

This is a utility algorithm that enables running any other alogorithm that takes either a single file or list of files as input against a directory of files. FileBatch will launch parallel calls to the target algorithm to simultaneously process batches of files using the specified batch size.

Applicable Scenarios and Problems

Great if you need to execute an algorithm against a full directory of files and get the aggregated results back in a single result file.

Usage

Input

[Input Directory], [Target Algorithm], [File Parameter Name], [Output Directory], [Additional Parameters], [Batch Size], [Takes-File-List-Flag]

##Input Directory String, data connector string to the directory containing files to process

##Target Algorithm String, URI of the target algorithm as you would use in a pipe() command

##File Parameter Name String, Name of the input parameter of the target algorithm that takes a file name or list of files

##Output Directory String, data connector string to the directory/collection you want the result file written to

##Additional Parameters String, json string of input parameters other than the file input parameter of the target algorithm. This exact string will be passed with every call to the target algo.

##Batch Size Integer, the number of files per batch you want to process

##Takes-File-List-Flag Boolean, true if the target algorithm's file parameter take a list of files, false if it takes only a single file

Output

Output will be consolidated and written to a single output file. The path to the output file is the FileBatch return string.

Examples

#1. Call algo that has a named parameter for a single file and an additional parameter called 'numResults' in batches of 3 for all files in a directory:

["data://.my/filesToProcess", "deeplearning/CaffeNet/2.0.1", "image", "data://.my/TestResults", ""numResults":2", 3, false]

#2. Call algo that takes only a list of files and no other parameters and doesn't name the list parameter:

["data://.my/filesToProcess", "algorithmiahq/Algorithm_Orchestration_Example/0.1.0", "", "data://.my/TestResults", "", 3, true]

#3. Call algo that has a named parameter that takes a list of files and an additional parameter called 'operation':

["data://.my/filesToProcess", "ktoole/FileListExample/0.1.0", "fileList", "data://.my/TestResults", ""operation":"bytes"", 5, true]

The above calsl will perform the following actions:

  • Break-up the list of files found in "data://.my/filesToProcess" collection into batches of the specified number of files each.
  • Run an execution thread to call the the specified algo for each batch with the specified 'additional parameters' appended to the each call.
  • Write the results for each file in the input directory into a single result file placed in the "data://.my/TestResults collection.
  • Return the full path to the result file.