Mobile Development

III. Dependency Harvesters

Michael Zent

Published: Jan 17th, 2024

Updated: Mar 15th, 2024

This work emerged in the context of the project APK Building on Mobile [1][2]. It should accompany at collecting packages and artifacts, necessary to build applications for Android, from remote repositories onto a mobile storage, but not only. Such the development routine, which takes most of the time and happens predominantly offline, can be separated from the rather rare, usually online, dependency management.

All software presented here can be downloaded for free, and for free usage under the GPLv3. No responsibility at all will be taken over for whatever You make with this software. So, it works as-is, but no warranty granted. Nonetheless Your remarks, bug reports, and suggestions are explicitly welcome at m.zent@fu-berlin.de.

The author also considers it necessary to assure hereby, that neither the following text, nor the ideas comprised, result even in the slightest from the usage of AI tools. Accordingly the author explicitly prohibits the use of this article, and of the author's hence resulting software, in automated systems without proper referencing. That is valid also for humans.

Content

1. A Small Digression

1.1. Dependency Management in a Nutshell
1.2. Airgaps in Remote Repository Access
1.3. Bridging the Airgap

1.3.1. Accessing APT Repositories
1.3.2. Accessing Maven Repositories

2. Harvest APT Repositories

2.1. Requirements
2.2. Installation
2.3. Usage

2.3.1. Options
2.3.2. Packages
2.3.3. Example

2.4. Internals
2.5. Changelog

3. Harvest Maven Repositories

3.1. Requirements
3.2. Installation
3.3. Usage

3.3.1. Options
3.3.2. Artifacts
3.3.3. Example

3.4. Internals
3.5. Changelog

Footnotes
References

Cotton Harvester, 1886 — *Cotton Harvester*, patented in Nov.1885 by Owen T. Bugg, and manufactured by *United States Cotton Harvester Co.*, New York, USA.

(*Frank Leslie's Illustrated Newspaper*, Vol.62, No.1600, pp.220-221, May 22, 1886)

1. A Small Digression

1.1. Dependency Management in a Nutshell

The work on a software project does not only include the design of its code, but also the management of its Dependencies, i.e. the various kind of Modules, aka Packages, required to build or run it. That may be other programs which are invoked to delegate certain tasks, or reusable units of associated resources to be integrated into the project. Latter can be Libraries, so sets of precompiled routines that may be linked with the program at compile time or loaded dynamically at runtime, as well as Assets like raw source code, type specifications and values, images and media files, databases, etc. Thereby it is essential to note, that Direct Dependencies, i.e. those which are directly included by the developers themselves into the project, will usually again depend on further modules. These are called Transitive Dependencies and will have to be added into the project for a successful build, too. [3][4][5][6]

Dependencies, needed at compile- as well as runtime, can roughly be distinguished according to their nativity to the machine's system, i.e. its hardware and operating system, leading us to different types of package manager being in their charge.

Native dependencies are those which in any way depend on properties of the machine's hardware and operating system, in particular programs and libraries in native binary code, or which transitively depend on such. According, these will in general be handled by the system-side package management.
Non-native dependencies are first of all programs and libraries written in a non-native programming language like Java, and will usually be handled by language-specific, or better said language-ecosystem-specific, package management systems.

Even though assets are of non-native nature, it's rather the context they are used in, which determines how they are delivered — stand-alone, or together with the native resp. non-native module they are associated with.

As we will see later, the boundaries between managers for native and non-native dependencies are fluid. A system-side package manager may also provide non-native components if required or meaningful. The same can be valid vice versa. But the rough overall picture is as aforementioned.

It has to be remarked, that no one is forced to employ any dependency management system at all. One could organize the provision of all dependencies bypassing them and rely only on own vigor. Albeit that may be feasible for a limited number of modules, chiefly private custom ones, it won't be viable to apply this approach for larger projects — surveys on repositories for various programming languages showed that the provided packages have in median up to 10 direct, and up to 70 transitive dependencies which one would have to keep track of, dependent on the programming language examined. With an upward trend. [7][8][9]

After having classified the potential dependencies, lets take a look at the tools we need to manage them. Dependency management comes to life in two parts, a remote and a local one. [10][11][12]

The remote one involves the Package Repositories, which are the main source for dependency modules. There is usually one central repository per language-ecosystem, plus mirrors. Besides additional public repositories, there are also private ones, for which one might need access permissions.
The other one, the local Package Manager, is the interface between the client and the remote repositories. Access to the aforementioned central repository and its mirrors is usually preconfigured, and can be expanded to further public and private ones. The package manager keeps track of the modules available in these repositories, and their dependency relations between each other. When the client queries a package, its archive file is downloaded from the remote repositories and installed into a local repository, i.e. a specified place on the local file system to be used to store and manage packages. Thereby the manager autonomously resolves the transitive dependencies and such installs all indirectly needed modules, too.

So, that was a short summary on Dependency Management in general. Now lets take a look on a specific use case where, instead of installing packages directly from a remote repository, one has to use an intermediate machinery.

1.2. Airgaps in Remote Repository Access

It is not rare that developers face a situation where they have to work at a project on a machine without access to online repositories, e.g. out of security reasons. Some common scenarios are

Airgap — The computer to work on is physically isolated from the internet as it is part of critical infrastructure, a network security measure known as Airgapping [13]. That can reach from a workstation of a nuclear power plant to a secured smartphone in a company's research department. The only possibility to move data to and from the secured computer are controlled storage media or internal institutional networks, cut off from the internet.
Sysadmin block — The hypothetical machine has an internet connection, but the user has no permission to invoke a dependency manager as any installation from remote sources is considered a potential security gap and is therefore exclusively allowed to the system administrators.
Poor connectivity — The programmer has at least temporarily to work in a region with electricity but without or instable internet connectivity. As of 2021 about 37% of the world's population did not have access to the internet [14]. but only about 10% did not have access to electricity [15]. Depending on the region one would rather have wireless or hardwired web access, latter e.g. via an internet cafe. This is surely reality for at least some thousands of developers when considering the about 690,000 programmers as of 2023 in Africa alone [16]. And remind — it was once possible for a programmer to live without the internet at all [17].

In the following we will discuss how to overcome the aforementioned situations — lets summarize them for the sake of simplicity under the term of Airgapping — in the light of developing an Android application on an airgapped handheld device.

1.3. Bridging the Airgap

Generally, to bridge an airgap one has to store the necessary components intermediately, so without installation, onto another machine with remote repository access. At need they can then be transferred to the isolated computer e.g. via a storage medium or a local connection. Note, that the intermediary could be a handheld as well as a PC, and indeed there are more wireless- than fixed-broadband subscriptions in the world [18].

Thereby not only the project's dependencies themselves but also their transitive dependencies have to be considered. And naturally does a project's offline build demand for the local availability of all needed components. Their online update should only be necessary if required by changes at the project.

With that said, some premises are to be mentioned in beforehand, based on the slogan Developing for Mobile on Mobile of this Mobile Development article series. Despite that, the described approach is transferable to other scenarios in principle.

We will focus on the work at Android application projects on Android devices, and only consider the conventional approach for APK building based on Java/Kotlin and C/C++.
Therefore ít is assumed that the targeted Android OS machine, where the application project in question is placed, runs Termux as a Debian GNU/Linux environment emulator [19]. Accordingly the intermediary, whether handheld or PC, is assumed to run a Debian derivative, too. This is a sane supposition, considering that two-thirds of all active Linux distributions are based on Debian [20].

So, Fig.1 illustrates our scenario. The handheld with the project is separated from the online repositories by an airgap. Proper intermediate devices, mobile or stationary, are available. Now its time to choose the instruments.

Fig.1. Bridging an airgap between the remote repositories and the handheld productive system via a mobile or stationary intermediary as end-point for an insecure network connection on one side and a controlled data transfer on the other.

1.3.1. Accessing APT Repositories

As laid down, the system of choice here is Debian GNU/Linux, resp. its derivatives. Hence, the system-side package manager is APT [21] with a configured access to Termux' main and supplementary remote repositories [22] to get access to the programs needed for the APK building toolchain [23] as well as to a wide range of native libraries one might want to use for a native portion of the Android application project [24]. Luckily, APT's tool apt-get provides an option to download packages only, instead of installing them. Everything could be fine then — just download everything needed to the intermediate machine and ship the packages to the target handheld via any kind of trusted medium. Alas, unluckily there are two obstacles.

There is no straightforward combination of options for the recursive download of transitive dependencies. This is valid for APT in general, so affects a handheld intermediary as much as a stationary one.
Using a PC intermediately will face the issue, that Termux is not a regular Debian derivative but an emulator, not compliant to the Filesystem Hierarchy Standard [25]. Therefore one can not simply use official Debian or Ubuntu packages for its environment but has to employ Termux-specific repositories containing adapted variants.

Wrapping APT, these aspects are solved by APT Harvester.

1.3.2. Accessing Maven Repositories

With APT we have taken care of our basic system-related dependencies. For further, Android development relies on Maven repositories [11][26] to receive the Java (JAR) and Android (AAR) archive files a project needs. The first is the classical file format for libraries written in Java, i.e. a non-native language, and can also contain assets like images and sound [27]. The latter is structurally nearly the same as an Android application package (APK) and can contain everything one might need to build an Android application project — JAR files as well as native dynamic libraries and assets of all kind [28]. Because of this variety, the generalizing term Artifact [29] was introduced for any kind of file in a Maven repository that can be addressed via its Coordinates [30].

And again we meet some difficulties.

The commonly used Gradle and Maven are not focused on transitive dependency resolution and download alone, but are comprehensive build automation tools, not needed on the intermediate machine. Also one has to take into consideration, that when using a mobile device as an intermediary, storage might be a rare good. However, the most basic setup of Gradle needs 150MB, and of Maven 40MB — without taking into account their dependencies. Luckily, there is an alternative. The Apache Ant project has split the dependency management functionality from the build automation into Apache Ivy [31], which only needs slightly more than 1MB. Alas, Ivy is only available as a Debian package for PC systems, but not for mobile devices.
Due to their extent of functionality, Gradle and Maven need laborious configuration via scripts resp. XML files. Though Ivy is far more slim, it still needs XML files to be pottered. This extra layer of complexity makes these tools less affable to use.

The Maven Harvester adresses these problems by wrapping Ivy, thereby bringing it also to mobile devices using Termux, and hiding its complexity from the user.

2. Harvest APT Repositories

A widespread package manager is apt , which is used by Debian, its derivatives, and also by Termux [32]. Unluckily, and in contrast e.g. to Python's pip [33] there is no straightforward combination of options to achieve a recursive download of packages and their whole transitive dependency trees [34]. However, the APT Harvester wraps apt to provide this funtionality.

2.1. Requirements

A machine running Android 7.0+. A version for Debian GNU/Linux and its derivatives [35] is coming.
On Android the application Termux is needed. A fresh installation will need ~100MB.
An internet connection is required for the purpose of the APT Harvester.

2.2. Installation

On Android:

If not yet done, install Termux, give it access to the internal storage by invoking
```
$ termux-setup-storage
```
Download the APT Harvester package
zs-apt-harvester_1.1.0_all_andro.deb (15kB)
Last tested 2024-01-14 on Android 8.0 with apt 2.7.7
Install the APT Harvester in Termux with e.g.
```
$ apt install /storage/emulated/0/Download/zs-apt-harvester_1.1.0_all_andro.deb
```
Note: The needed packages apt, bash, and their dependencies, are already shipped with Termux. Update, if necessary, bash to at least version 4.1.

If everything went well, a call of apt-cache show zs-apt-harvester will yield the following:

Package: zs-apt-harvester
Status: install ok installed
Installed-Size: 43
Maintainer: Michael Zent 
Architecture: all
Version: 1.1.0
Depends: apt, bash (>= 4.1)
Description: APT Repository Harvester, e.g. for offline APK building
Description-md5:

On Debian GNU/Linux and its derivatives:

Coming soon.

2.3. Usage

The general use pattern is as follows.

$ zs-apt-harvester [options] [packages]

2.3.1. Options

```
$ zs-apt-harvester -D <dir_path> -f <file_path> -L <file_path>
```
The -D option can be used to specify a directory, where the downloaded packages shall be placed. If not present, the current working directory $PWD will be used per default.
The -f option specifies a file with the list of packages, identified by name, to be retrieved. The package names are to be separated by whitespace characters, whereas their kind and number is of no relevance, allowing to format the list to one's desire. If no file has been specified, the APT Harvester will per default look in the download directory for a file named wish.list for input. As part of its output zs-apt-harvester furthermore for later reuse creates, resp. overwrites if already existing, a wish.list file in the download directory, preserving the user's wishes, including those given via command line, which led to the given result.
Use -L to specify the file to which the log information shall be written. If not specified, this will be log.txt in the download directory. An existing file of the same name will be overwritten.
If -D, -f, resp. -L is used multiple times, only the last entry will be considered.
```
$ zs-apt-harvester --no-update
```
Disables APT-updating, which is per default carried out to check the availability of requested packages on an up-to-date base. When not needed, e.g. because apt update has been executed recently, one can disable this step to speed-up the program.
```
$ zs-apt-harvester -h -V
```
Print helpful information with -h, and the program's version with -V. When used together, -h supersedes -V. Any of both will revoke any other option, i.e. the program will do nothing except printing the requested information.

Providing options is not obligatory. When no options are present then default settings will come into force. So a call of zs-apt-harvester without any options equals to the following.

$ zs-apt-harvester -D $PWD -f $PWD/wish.list -L $PWD/log.txt

Specifying only a download directory dwndir is equivalent to:

$ zs-apt-harvester -D dwndir -f dwndir/wish.list -L dwndir/log.txt

2.3.2. Packages

The packages which shall be downloaded together with all their dependencies have to be identified by name, and can be provided via file as well as command line, separated by any kind and number of whitespace characters. If no -f option is present to explicitly specify a file to be read from, the APT Harvester will look in the download directory for a file named wish.list. Additionally, all packages named by command line will be taken into consideration.

In the end, all retrieved .deb package files can thereafter be found in the download directory and are listed in the accompanying file pkg.list. Ibidem also all the initial package wishes are stored into the file wish.list.

2.3.3. Example

Lets assume one wants to harvest all packages needed for the XAMPP web server solution stack, e.g. into the directory ~/XAMPP. Then one could call:

$ zs-apt-harvester -D ~/XAMPP apache2 mariadb php perl

To avoid unnecessary retyping one could create a file ~/XAMPP/wish.list containing the line apache2 mariadb php perl. Note, that one can actually separate the package names with any kind and number of whitespace characters to format the file as one likes. As zs-apt-harvester would per default look for a file wish.list in the download directory, the corresponding invocation could then be reduced to:

$ zs-apt-harvester -D ~/XAMPP

When one wants to keep the wishlist separately, e.g. as xampp.list in a directory ~/catalog, one can use the option -f for reference, so:

$ zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list

If one has interest in an additional package not enlisted in the given file, but out of some reason does not want to alter it, one can simply add the package in question on the command line, i.e.:

$ zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list sqlite

Executing that command should yield the result shown in Fig.2.

Fig.2. Final output after running zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list sqlite to retrieve the XAMPP packages apache2, mariadb, php and perl as written in the xampp.list file, plus sqlite, into the directory XAMPP.

Looking afterwards into the download directory e.g. with ls ~/XAMPP one will see the downloaded .deb files, and text files generated by zs-apt-harvester.

pkg.list names all the retrieved files from apache2_1%3a2.4.58_aarch64.deb to zstd_1.5.5-1_aarch64.deb.
wish.list contains the user's wishes for the last run on the download directory for reuse, so here apache2 mariadb php perl sqlite.
log.txt preserves the program's screen output for later reread.

2.4. Internals

The execution flow of the APT Harvester is depicted in Fig.3, comprising three phases — program setup, collection of the user's package wishes, and the organization of the actual downloads, including all dependencies.

Fig.3. The 3-phased approach of the APT Harvester to retrieve packages from APT repositories.

Setup

First of all, it has to be ensured that the APT Harvester is operative, i.e. the program's dependencies as well as an internet connection are available. Otherwise the program is terminated. Next the core settings which control the further program behavior are initialized with default values. If one has other needs, one may provide custom values via the command line options of the APT Harvester. This way the following parameters, as represented in Fig.1, will be defined.

DownDir names the download directory to store the retrieved packages into,
WishFile and CliWishes list the names of the packages to be retrieved, specified by file and/or command line,
LogFile names the file to store the program's logging information, and
Update specifies whether to update the local APT cache containing information about the package stock in the configured remote APT repositories.

The WishList, empty yet, will later list the name of all packages, including their transitive dependencies, which are to be harvested. After starting to log and assuring, that the DownDir actually exists, the next phase is initiated.

Get Wishes

As mentioned, one can specify the packages one wishes to download via a WishFile as well as CliWishes. The content of both — if existing — is merged into the WishList. If no wishes at all have been expressed, the program terminates. Otherwise the wishes will be sorted alphabetically, duplicates removed and written to the file wish.list in the download directory. With that the WishList includes all immediate demands of the user, without dependencies yet.

Get Packages

Before downloading a single package, the WishList is cross-checked with the list of packages available in the configured APT repositories. By invoking apt-get update the locally cached APT package list is refreshed. This step can be skipped using the --no-update option. Querying apt-cache pkgnames then results in a full list of currently known, available packages. If any package in the WishList can not be found there, than the request is considered invalid and the program terminates.

Applying apt-cache depends to the now validated WishList enhances it with all transitive dependencies. The such finalized WishList can then be fed to apt-get download to harvest everything needed from the APT repositories. The resulting output is placed into DownDir:

The wished-for packages and all their transitive dependencies, i.e. the fetched .deb files.
The file pkg.list is generated, listing all the retrieved package files alphabetically.

In a last step all temporary files are cleaned up, logging is concluded and the LogFile, if no other path has been specified, is also stored in DownDir.

2.5. Changelog

1.1.0 — 14.01.2024

Renamed apt-down to zs-apt-harvester.
Introduced logging, added the corresponding -L option.
Result file pkg.list now contains a list of actually retrieved .deb package files, and not only a namelist of packages intended to fetch. For latter now the file wish.list is written.
Now apt-get update is executed to check the availability of requested packages on an up-to-date base. This behavior can be changed using the option --no-update.
Added checks, esp. on availabiltiy of dependencies, to ensure the integrity of the environment.
Removed the -r to recursively check directories for default wish.list files out of consistency reasons.

3. Harvest Maven Repositories

One basic step in application development is to note the modules, on which the project depends on, so that they can be included during the building process. Corresponding non-own module files have usually to be retrieved from remote repositories before they can be employed in the local build process. Thereby it is necessary to consider their whole dependency tree, i.e. also the transitive dependencies.

The Maven Harvester is a tool focusing on transitive module retrieval from Apache Maven repositories [11]. Though initially it's main purpose has been to assist in the provision of dependencies for building APKs with zs-apkbuilder, it can be applied on any Maven packages of interest.

To accomplish its aim Apache Ivy [31] is employed, a tool for a variety of project dependency managemennt tasks. However, zs-mvn-harvester mantles its complexity and provides a simple interface to concentrate on transitive dependency resolution and to serve the results in a ready-to-use way.

3.1. Requirements

A machine running Android 7.0+, or a Debian derivative [35].
On Android the application Termux is needed. A fresh installation will need ~100MB.
For the Maven Harvester and all its dependencies ~250MB should be available.^(a)
An internet connection is required for the purpose of zs-mvn-harvester.

3.2. Installation

On Android:

If not yet done, install Termux, give it access to the internal storage by invoking
```
$ termux-setup-storage
```
Download the Maven Harvester package
zs-mvn-harvester_1.2.0_all_andro.deb (1.3MB)
Last tested 2024-03-13 on Android 8.0 with ivy 2.5.2
Install the Maven Harvester in Termux, assuming the .deb file e.g. in the device's Download folder, with
```
$ apt install --no-install-recommends /storage/emulated/0/Download/zs-mvn-harvester_1.2.0_all_andro.deb
```
Note: During installation, the package manager apt will, at existing internet connection, automatically download all packages, on which the Maven Harvester directly and transitively depends, from the Termux repositories, and install them immediately. If this is unwanted one can do all that oneself, for which the APT Harvester would be very useful. In that case the first-level dependencies openjdk-17 and xmlstarlet with their transitive dependencies will have to be installed locally before zs-mvn-harvester itself. At need also update bash to at least version 4.1.

If everything went well, a call of apt-cache show zs-mvn-harvester will yield the following:

Package: zs-mvn-harvester
Status: install ok installed
Installed-Size: 1424
Maintainer: Michael Zent 
Architecture: all
Version: 1.2.0
Depends: bash (>= 4.1), openjdk-17, xmlstarlet
Suggests: zs-apk-builder
Description: Maven Repository Harvester, e.g. for offline APK building
Description-md5:

On Debian GNU/Linux and its derivatives:

Download the Maven Harvester package
zs-mvn-harvester_1.2.0_all_linux.deb (1.3MB)
last tested 2024-03-13 on antiX 22 (Debian GNU/Linux 11) with ivy 2.5.2
Install the Maven Harvester via the Bash prompt, assuming the .deb file e.g. in the $HOME folder, with
```
$ apt install --no-install-recommends $HOME/zs-mvn-harvester_1.2.0_all_linux.deb
```
Note: During installation, the package manager apt will, at existing internet connection, automatically download all packages, on which the Maven Harvester directly and transitively depends, from the Debian repositories, and install them immediately. If this is unwanted one can do all that oneself, for which the APT Harvester would be very useful. In that case the first-level dependencies openjdk-17-jre-headless and xmlstarlet with their transitive dependencies will have to be installed locally before zs-mvn-harvester itself. At need also update bash to at least version 4.1.

If everything went well, a call of apt-cache show zs-mvn-harvester will yield the following:

Package: zs-mvn-harvester
Status: install ok installed
Installed-Size: 1424
Maintainer: Michael Zent 
Architecture: all
Version: 1.2.0
Depends: bash (>= 4.1), openjdk-17-jre-headless, xmlstarlet
Recommends: zs-apkbuilder
Description: Maven Repository Harvester, e.g. for offline APK building
Description-md5: 53ca0e0fb1d369df8fa11163acaffeaa

3.3. Usage

The general use pattern is as follows.

$ zs-mvn-harvester [options] [artifacts]

3.3.1. Options

```
$ zs-mvn-harvester -D <dir_path> -f <file_path> -R <file_path> -L <file_path>
```
The -D option can be used to specify a directory, where the downloaded artifacts shall be placed. If not present, the current working directory $PWD will be used per default.
The -f option specifies a file with the list of artifacts, identified by their Maven coordinates, to be retrieved. The artifacts are to be separated by whitespace characters, whereas their kind and number is of no relevance, allowing to format the list to one's desire. If no file has been specified, the Maven Harvester will per default look in the download directory for a file named wish.list for input. As part of its output zs-mvn-harvester furthermore for later reuse creates, resp. overwrites if already existing, a wish.list file in the download directory, preserving the user's wishes, including those given via command line, which led to the given result.
The -R option can be used to specify a file, which lists the Maven repositories to be queried. When unspecified, it will be looked in the download directory for a file named repo.list per default. If no such file can be found, it will be auto-generated and configured for the repositories https://maven.google.com and https://repo1.maven.org/maven2.
Use -L to specify the log file to which the output shall be written. If not specified, this will be log.txt in the download directory. An existing file of the same name will be overwritten.
If -D, -f, -R resp. -L is used multiple times, only the last entry will be considered.
```
$ zs-mvn-harvester --no-connection-check
```
Disable pre-checking on an existing internet connection. Use especially when being affected by QEMU's ping-issue [36].
```
$ zs-mvn-harvester --time
```
Measure and print the program's task-wise execution time.
```
$ zs-mvn-harvester --topo-sort
```
Topologically sort the list of resolved artifacts in the output file libs.list. This can be used whenever it is necessary to feed dependencies in their topological order, beginning with the lowest-level one.
When not set, per default an alphanumerical sort is applied to libs.list.
```
$ zs-mvn-harvester -v
```
Enables verbose output.
```
$ zs-mvn-harvester -h -V
```
Print helpful information with -h, and the program's version with -V. When used together, -h supersedes -V. Any of both will revoke any other option, i.e. the program will do nothing except printing the requested information.

Providing options is not obligatory. When no options are present then default settings will come into force. So a call of zs-mvn-harvester without any options equals to the following.

$ zs-mvn-harvester -D $PWD -f $PWD/wish.list -R $PWD/repo.list -L $PWD/log.txt

Specifying only a download directory dwndir is equivalent to:

$ zs-apt-harvester -D dwndir -f dwndir/wish.list -R dwndir/repo.list -L dwndir/log.txt

3.3.2. Artifacts

The artifacts which shall be downloaded together with their dependencies have to be identified using the XML namespace format for their Maven coordinates, i.e. groupId:artifactId:version [30]. These can be provided via a file as well as the command line, separated by any kind and number of whitespace characters. If no -f option is present to explicitly specify a file to be read from, the Maven Harvester will look in the download directory for a file named wish.list. Additionally, all artifacts named by command line will be taken into consideration.

In the end, all retrieved artifact files can thereafter be found in the download directory. The accompanying file libs.list names them all, and can e.g. be fed to zs-apkbuilder to determine an APK project's dependencies. Correspondingly the failed retrievals are listed in the file miss.list, supporting the search for errors in the dependency tree and a manual retrieval if need be. The initial artifact wishes are preserved in the file wish.list.

In addition the archive cache.tar.gz stores retrieval data generated by Ivy, so already present artifacts won't be refetched, saving time and internet traffic. In case the archive got deleted the dependency resolution process will have to start from null, and all artifacts will be downloaded anew.

3.3.3. Example

Lets assume one wants to harvest all artifacts needed for the project PdfRendererBasic from the Android Graphics Samples [37], e.g. into the directory ~/PdfRendererBasic/deps. Then one could call:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps androidx.appcompat:appcompat:1.4.2 androidx.arch.core:core-testing:2.1.0 androidx.fragment:fragment-ktx:1.5.1 androidx.lifecycle:lifecycle-livedata-ktx:2.5.1 androidx.lifecycle:lifecycle-viewmodel-ktx:2.5.1 com.google.truth:truth:1.0 org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.7.10 org.jetbrains.kotlinx:kotlinx-coroutines-android:1.6.1

To avoid unnecessary retyping one could create a file ~/PdfRendererBasic/deps/wish.list containing the entries

androidx.appcompat:appcompat:1.4.2
androidx.arch.core:core-testing:2.1.0
androidx.fragment:fragment-ktx:1.5.1
androidx.lifecycle:lifecycle-livedata-ktx:2.5.1
androidx.lifecycle:lifecycle-viewmodel-ktx:2.5.1
com.google.truth:truth:1.0
org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.7.10
org.jetbrains.kotlinx:kotlinx-coroutines-android:1.6.1

Note, that one can actually separate the artifact coordinates with any kind and number of whitespace characters to format the file as one likes. As zs-mvn-harvester would per default look for a file wish.list in the download directory, the corresponding invocation could then be reduced to:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps

When one wants to keep the wishlist separately, e.g. as PdfRendererBasic.list in a directory ~/catalog, one can use the option -f for reference, so:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list

If one has interest in additional packages not enlisted in the given file, but out of some reason does not want to alter it, one can simply add the packages in question on the command line, e.g. here one could add some libraries employed for program testing:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list androidx.test.espresso:espresso-core:3.4.0 androidx.test.ext:junit:1.1.3 androidx.test.ext:truth:1.4.0 androidx.test:core:1.4.0

Executing that command should yield the result shown in Fig.4.

Fig.4. Final report after zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list androidx.test.espresso:espresso-core:3.4.0 androidx.test.ext:junit:1.1.3 androidx.test.ext:truth:1.4.0 androidx.test:core:1.4 has been run to retrieve the dependencies of the PdfRendererBasic project as recorded in the correspondig .list file, plus testing libraries, into the the project's local dependency directory deps. At that #Resolved + #Unresolved equals to modules.number - modules.evicted holds.

Looking afterwards into the download directory e.g. with ls ~/PdfRendererBasic/deps one will see the downloaded artifacts, and further files generated by zs-mvn-harvester.

libs.list names the retrieved files from androidx.activity:activity-ktx:1.5.1 to org.w3c.css:sac:1.3. When the option --topo-sort was applied, the list will be sorted topologically instead of alphanumerically.
The coordinates of unresolved dependencies are written to miss.list. Here the file is empty, as everything could be resolved.

wish.list contains the user's wishes for the last run on the download directory, so here:

androidx.appcompat:appcompat:1.4.2
androidx.arch.core:core-testing:2.1.0
androidx.fragment:fragment-ktx:1.5.1
androidx.lifecycle:lifecycle-livedata-ktx:2.5.1
androidx.lifecycle:lifecycle-viewmodel-ktx:2.5.1
androidx.test.espresso:espresso-core:3.4.0
androidx.test.ext:junit:1.1.3
androidx.test.ext:truth:1.4.0
androidx.test:core:1.4.0
com.google.truth:truth:1.0
org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.7.10
org.jetbrains.kotlinx:kotlinx-coroutines-android:1.6.1

repo.list includes the used Maven repositories. As no others have have been specified, it are the default ones:
```
https://maven.google.com
https://repo1.maven.org/maven2
```
The same way one can create a customized repo.list file and feed it to zs-mvn-harvester.
log.txt preserves the program's screen output for later reread.
cache.tar.gz archives information from the retrieval process which helps to avoid retransmissions in further runs on that download directory.

3.4. Internals

As depicted in Fig.5, three phases have to be passed to fulfil the user's wishes — program setup, wish collection and the actual download of the desired artifacts and all their dependencies.

Fig.5. The 3-phased approach of the Maven Harvester to retrieve packages from Maven repositories.

Setup

Firstly, it has to be ensured that zs-mvn-harvester is operative, i.e. the program's dependencies as well as an internet connection are available. Otherwise the program is terminated. Next the core settings which control the further program behavior are initialized with default values. If one has other needs, one may provide custom values via the Command Line options of zs-mvn-harvester as described in Usage. This way the following parameters will be defined.

DownDir names the target directory to store the downloaded artifacts into,
WishFile and CliWishes list the Maven coordinates of the artifacts to be retrieved, specified by file and/or command line,
LogFile names the file to store the program's logging information, and
RepoFile names the file which lists URLs pointing to the public remote Maven repositories which shall be requested. Note that the repositories will be polled by list order, i.e. it is recommended to sort the URLs by relative hit chance.

Logging starts. After assuring, that the DownDir actually exists, zs-mvn-harvester will check it for the archive cache.tar.gz. If available, it will be unpacked. It contains property files generated by Ivy for each module it has loaded from a repository into this directory in previous runs. By means of these, in the next run Ivy will be able to reconcile the user's wishes with the current content of DownDir and avoid unnecessary re-downloads, as described below.

Get Wishes

As mentioned, one can specify the artifacts one wishes to download via a WishFile as well as CliWishes. The content of both — if existing — are merged into a WishList. If no wishes have been expressed, the program terminates. Otherwise the WishList will be sorted alphabetically, duplicates removed, written to the file wish.list in the download directory, and translated into the Ivy Module Descriptor File ivy.xml.

Get Artifacts

First the remote repositories have to be configured, from which the artifacts shall be retrieved. If a user-defined RepoFile is available, the list of repository URLs will be read from there. Otherwise a default RepoFile will be written into DownDir, pointing to Google's Maven Repository and the Maven Central Repository. This order has been chosen as Android application projects will more frequently need artifacts from the first one. Finally, the repositories are written into the Ivy Settings File ivysettings.xml, together with DownDir to tell Ivy where to put the results.

Now the configuration is complete, consisting of the following files, serving as Ivy's input:

The Ivy Settings File ivysettings.xml tells Ivy which repositories shall be accessed in which order, and where the results have to be placed.
The Ivy Module Descriptor File ivy.xml lists the immediate artifacts demanded by the user. Ivy will fetch them, and recursively resolve and retrieve all of their transitive dependencies as well.
If a cache archive cache.tar.gz was available, Ivy Module Property Files will be present, describing the state in DownDir after the previous run, helping to avoid retransmissions of already present artifacts.

Note, that no files will ever be deleted by zs-mvn-harvester, only new ones added. Also remind, that two artifact instances are considered equal, when having exactly the same Maven coordinates. So, when an artifact is already present with a higher version, but a lower version is now demanded, the request is not yet considered satisfied. The artifact of higher version remains untouched, and the one of lower version is added. An artifact that was not requested for, but is present in cache and/or DownDir is not affected by the harvester.

The output of Ivy is put into DownDir, though with some transformations:

The wished-for artifacts and all their transitive dependencies, i.e. the fetched .aar and .jar files, are stored as-is to DownDir within subdirectories according to their Maven coordinates.
For each fetched artifact Ivy generates a Module Property File. They are archived into cache.tar.gz and stored to DownDir for the aforementioned purpose.
Ivy summarizes the results into a XML Report File, which is then evaluated for successfully retrieved and still missing artifacts. The first are written with their Maven coordinates line by line into the simple text file libs.list in DownDir. That can be used e.g. as an input for zs-apkbuilder to name all dependencies of an Android application project. Correspondingly the missing artifacts are listed in miss.list.

In a last step all temporary files are cleaned up, logging is concluded and the LogFile, if no other path has been specified, is also stored in DownDir.

3.5. Changelog

1.2.0 — 13.03.2024

Added the option --no-connection-check to disable pre-checking on an existing internet connection. Useful when running zs-mvn-harvester within QEMU.
Boosted the speed of the topological sorting method.

1.1.0 — 11.03.2024

Added the option --time to measure task-wise execution times.
Introduced the option --topo-sort for sorting the entries of the successfully retrieved artifacts in the output file libs.list topologically instead of alphanumerically.
Fixed logging flaws in artifact conflict situations.

1.0.2 — 25.02.2024

Fixed a bug, which occurred when an older version of an artifact has been ordered, but a newer artifact version superseded it. The libs.list has then not correctly been written, as the supersedence was interpreted as if the ordered artifact would be missing, though its newer version actually has been retrieved correctly.
Fixed a bug, which appeared in the final cleanup phase when package group names ended on *.css. Instead of only removing a temporary .css file produced by Ivy, it was also tried to remove the package's *.css directory, resulting in an ugly error message. However, nothing was deleted, so it is merely a cosmetic fix.
Removed the screen clearing before the program's start.

Footnotes

When zs-mvn-harvester is used as a supplement for zs-apk-builder, then only ~1.5MB are necessary because they overlap in their dependencies.

References

M. Zent (2021), Mobile Development: APK Building on Mobile , timeout.userpage.fu-berlin.de
M. Zent (2021), Mobile Development: II. APK Builder, timeout.userpage.fu-berlin.de
A. Butterfield & G. E. Ngondi (2016), A Dictionary of Computer Science: Oxford Quick Reference, 7th Ed., Oxford University Press
Gradle User Manual, Dependency Management Terminology, docs.gradle.org
Apache Maven Project, Introduction to the Dependency Mechanism, maven.apache.org
R. Kikas, G. Gousios, M. Dumas & D. Pfahl (2017), Structure and Evolution of Package Dependency Networks, Proc. of the 14th Int. Conf. on Mining Software Repositories (MSR '17), IEEE Press, pp.102–112
N. Forsgren (2021), The 2020 State of the OCTOVERSE: Securing the world's software, GitHub
C. Soto-Valero, N. Harrand, M. Martin & B. Baudry (2020), A comprehensive study of bloated dependencies in the Maven ecosystem, Empir. Softw. Eng. 26:45
Station 9 (2023), State of Dependency Management, Endor Labs
Gradle User Manual, Dependency Management Terminology, docs.gradle.org
Apache Maven Project, Maven Repositories, maven.apache.org
MDN Web Docs, Package management basics, developer.mozilla.org
W. D. Bryant (2015), International Conflict and Cyberspace Superiority: Theory and Practice, Routledge, pp.107
International Telecommunication Union (2021), Facts and Figures 2021: 2.9 billion people still offline, itu.int
International Energy Agency (2021), Global population without access to electricity by region, 2000-2021, iea.org
E. Spruyt (2023), African Software Developers: Best Countries for Outsourcing in 2023, tunga.io
P. King (2013), What Was It Like To Be A Programmer Without The Internet?, forbes.com
International Telecommunication Union (2023), Measuring digital development: Facts and Figures, itu.int
M. Zent (2021), Mobile Development: APK Building on Mobile — Implementing the Toolchain for Android, timeout.userpage.fu-berlin.de
B. Byfield (2020), Distro Walk — Debian Derivatives, Linux Magazine 239/2020
Debian Wiki, APT, wiki.debian.org
The Termux Wiki, Package Management, wiki.termux.com
M. Zent (2021), Mobile Development: APK Building on Mobile — Selecting the toolchain components, timeout.userpage.fu-berlin.de
Android Developer Guides, Android NDK — Concepts, developer.android.com
The Termux Wiki, Differences from Linux, wiki.termux.com
Android Developer Guides, Android Studio — Add build dependencies, developer.android.com
Java SE 8 Documentation, Java Archive (JAR) Files, docs.oracle.com
Android Developer Guides, Android Studio — Create an Android library, developer.android.com
Apache Maven Project, Maven Artifacts, maven.apache.org
Apache Maven Project, POM Reference — Maven Coordinates, maven.apache.org
Apache Ant Project, Ivy — The agile dependency manager, ant.apache.org
Debian Wiki, APT, wiki.debian.org
PIP Documentation, pip download, pip.pypa.io
Y. Vo (2017), How to list/download the recursive dependencies of a debian package, stackoverflow.com
DistroWatch database search for distributions based on Debian, distrowatch.com
QEMU Wiki, Documentation/Networking, wiki.qemu.org
Android Graphics Samples, PdfRendererBasic, github.com