FU logo

Mobile Development

III. Dependency Harvesters

Michael Zent

Published: Jan 17th, 2024

Updated: Mar 15th, 2024

This work emerged in the context of the project APK Building on Mobile [1][2]. It should accompany at collecting packages and artifacts, necessary to build applications for Android, from remote repositories onto a mobile storage, but not only. Such the development routine, which takes most of the time and happens predominantly offline, can be separated from the rather rare, usually online, dependency management.

All software presented here can be downloaded for free, and for free usage under the GPLv3. No responsibility at all will be taken over for whatever You make with this software. So, it works as-is, but no warranty granted. Nonetheless Your remarks, bug reports, and suggestions are explicitly welcome at m.zent@fu-berlin.de.

The author also considers it necessary to assure hereby, that neither the following text, nor the ideas comprised, result even in the slightest from the usage of AI tools. Accordingly the author explicitly prohibits the use of this article, and of the author's hence resulting software, in automated systems without proper referencing. That is valid also for humans.

Cotton Harvester, 1886
Cotton Harvester, patented in Nov.1885 by Owen T. Bugg, and manufactured by United States Cotton Harvester Co., New York, USA.

(Frank Leslie's Illustrated Newspaper, Vol.62, No.1600, pp.220-221, May 22, 1886)

1. A Small Digression

1.1. Dependency Management in a Nutshell

The work on a software project does not only include the design of its code, but also the management of its Dependencies, i.e. the various kind of Modules, aka Packages, required to build or run it. That may be other programs which are invoked to delegate certain tasks, or reusable units of associated resources to be integrated into the project. Latter can be Libraries, so sets of precompiled routines that may be linked with the program at compile time or loaded dynamically at runtime, as well as Assets like raw source code, type specifications and values, images and media files, databases, etc. Thereby it is essential to note, that Direct Dependencies, i.e. those which are directly included by the developers themselves into the project, will usually again depend on further modules. These are called Transitive Dependencies and will have to be added into the project for a successful build, too. [3][4][5][6]

Dependencies, needed at compile- as well as runtime, can roughly be distinguished according to their nativity to the machine's system, i.e. its hardware and operating system, leading us to different types of package manager being in their charge.

Even though assets are of non-native nature, it's rather the context they are used in, which determines how they are delivered — stand-alone, or together with the native resp. non-native module they are associated with.

As we will see later, the boundaries between managers for native and non-native dependencies are fluid. A system-side package manager may also provide non-native components if required or meaningful. The same can be valid vice versa. But the rough overall picture is as aforementioned.

It has to be remarked, that no one is forced to employ any dependency management system at all. One could organize the provision of all dependencies bypassing them and rely only on own vigor. Albeit that may be feasible for a limited number of modules, chiefly private custom ones, it won't be viable to apply this approach for larger projects — surveys on repositories for various programming languages showed that the provided packages have in median up to 10 direct, and up to 70 transitive dependencies which one would have to keep track of, dependent on the programming language examined. With an upward trend. [7][8][9]

After having classified the potential dependencies, lets take a look at the tools we need to manage them. Dependency management comes to life in two parts, a remote and a local one. [10][11][12]

So, that was a short summary on Dependency Management in general. Now lets take a look on a specific use case where, instead of installing packages directly from a remote repository, one has to use an intermediate machinery.

1.2. Airgaps in Remote Repository Access

It is not rare that developers face a situation where they have to work at a project on a machine without access to online repositories, e.g. out of security reasons. Some common scenarios are

In the following we will discuss how to overcome the aforementioned situations — lets summarize them for the sake of simplicity under the term of Airgapping — in the light of developing an Android application on an airgapped handheld device.

1.3. Bridging the Airgap

Generally, to bridge an airgap one has to store the necessary components intermediately, so without installation, onto another machine with remote repository access. At need they can then be transferred to the isolated computer e.g. via a storage medium or a local connection. Note, that the intermediary could be a handheld as well as a PC, and indeed there are more wireless- than fixed-broadband subscriptions in the world [18].

Thereby not only the project's dependencies themselves but also their transitive dependencies have to be considered. And naturally does a project's offline build demand for the local availability of all needed components. Their online update should only be necessary if required by changes at the project.

With that said, some premises are to be mentioned in beforehand, based on the slogan Developing for Mobile on Mobile of this Mobile Development article series. Despite that, the described approach is transferable to other scenarios in principle.

So, Fig.1 illustrates our scenario. The handheld with the project is separated from the online repositories by an airgap. Proper intermediate devices, mobile or stationary, are available. Now its time to choose the instruments.

Fig.1. Bridging an airgap between the remote repositories and the handheld productive system via a mobile or stationary intermediary as end-point for an insecure network connection on one side and a controlled data transfer on the other.

1.3.1. Accessing APT Repositories

As laid down, the system of choice here is Debian GNU/Linux, resp. its derivatives. Hence, the system-side package manager is APT [21] with a configured access to Termux' main and supplementary remote repositories [22] to get access to the programs needed for the APK building toolchain [23] as well as to a wide range of native libraries one might want to use for a native portion of the Android application project [24]. Luckily, APT's tool apt-get provides an option to download packages only, instead of installing them. Everything could be fine then — just download everything needed to the intermediate machine and ship the packages to the target handheld via any kind of trusted medium. Alas, unluckily there are two obstacles.

Wrapping APT, these aspects are solved by APT Harvester.

1.3.2. Accessing Maven Repositories

With APT we have taken care of our basic system-related dependencies. For further, Android development relies on Maven repositories [11][26] to receive the Java (JAR) and Android (AAR) archive files a project needs. The first is the classical file format for libraries written in Java, i.e. a non-native language, and can also contain assets like images and sound [27]. The latter is structurally nearly the same as an Android application package (APK) and can contain everything one might need to build an Android application project — JAR files as well as native dynamic libraries and assets of all kind [28]. Because of this variety, the generalizing term Artifact [29] was introduced for any kind of file in a Maven repository that can be addressed via its Coordinates [30].

And again we meet some difficulties.

The Maven Harvester adresses these problems by wrapping Ivy, thereby bringing it also to mobile devices using Termux, and hiding its complexity from the user.

2. Harvest APT Repositories

A widespread package manager is apt , which is used by Debian, its derivatives, and also by Termux [32]. Unluckily, and in contrast e.g. to Python's pip [33] there is no straightforward combination of options to achieve a recursive download of packages and their whole transitive dependency trees [34]. However, the APT Harvester wraps apt to provide this funtionality.

2.1. Requirements

2.2. Installation

On Android:

On Debian GNU/Linux and its derivatives:

2.3. Usage

The general use pattern is as follows.

$ zs-apt-harvester [options] [packages]

2.3.1. Options

Providing options is not obligatory. When no options are present then default settings will come into force. So a call of zs-apt-harvester without any options equals to the following.

$ zs-apt-harvester -D $PWD -f $PWD/wish.list -L $PWD/log.txt

Specifying only a download directory dwndir is equivalent to:

$ zs-apt-harvester -D dwndir -f dwndir/wish.list -L dwndir/log.txt

2.3.2. Packages

The packages which shall be downloaded together with all their dependencies have to be identified by name, and can be provided via file as well as command line, separated by any kind and number of whitespace characters. If no -f option is present to explicitly specify a file to be read from, the APT Harvester will look in the download directory for a file named wish.list. Additionally, all packages named by command line will be taken into consideration.

In the end, all retrieved .deb package files can thereafter be found in the download directory and are listed in the accompanying file pkg.list. Ibidem also all the initial package wishes are stored into the file wish.list.

2.3.3. Example

Lets assume one wants to harvest all packages needed for the XAMPP web server solution stack, e.g. into the directory ~/XAMPP. Then one could call:

$ zs-apt-harvester -D ~/XAMPP apache2 mariadb php perl

To avoid unnecessary retyping one could create a file ~/XAMPP/wish.list containing the line apache2 mariadb php perl. Note, that one can actually separate the package names with any kind and number of whitespace characters to format the file as one likes. As zs-apt-harvester would per default look for a file wish.list in the download directory, the corresponding invocation could then be reduced to:

$ zs-apt-harvester -D ~/XAMPP

When one wants to keep the wishlist separately, e.g. as xampp.list in a directory ~/catalog, one can use the option -f for reference, so:

$ zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list

If one has interest in an additional package not enlisted in the given file, but out of some reason does not want to alter it, one can simply add the package in question on the command line, i.e.:

$ zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list sqlite

Executing that command should yield the result shown in Fig.2.

Fig.2. Final output after running zs-apt-harvester -D ~/XAMPP -f ~/catalog/xampp.list sqlite to retrieve the XAMPP packages apache2, mariadb, php and perl as written in the xampp.list file, plus sqlite, into the directory XAMPP.

Looking afterwards into the download directory e.g. with ls ~/XAMPP one will see the downloaded .deb files, and text files generated by zs-apt-harvester.

2.4. Internals

The execution flow of the APT Harvester is depicted in Fig.3, comprising three phases — program setup, collection of the user's package wishes, and the organization of the actual downloads, including all dependencies.

Fig.3. The 3-phased approach of the APT Harvester to retrieve packages from APT repositories.

Setup

First of all, it has to be ensured that the APT Harvester is operative, i.e. the program's dependencies as well as an internet connection are available. Otherwise the program is terminated. Next the core settings which control the further program behavior are initialized with default values. If one has other needs, one may provide custom values via the command line options of the APT Harvester. This way the following parameters, as represented in Fig.1, will be defined.

The WishList, empty yet, will later list the name of all packages, including their transitive dependencies, which are to be harvested. After starting to log and assuring, that the DownDir actually exists, the next phase is initiated.

Get Wishes

As mentioned, one can specify the packages one wishes to download via a WishFile as well as CliWishes. The content of both — if existing — is merged into the WishList. If no wishes at all have been expressed, the program terminates. Otherwise the wishes will be sorted alphabetically, duplicates removed and written to the file wish.list in the download directory. With that the WishList includes all immediate demands of the user, without dependencies yet.

Get Packages

Before downloading a single package, the WishList is cross-checked with the list of packages available in the configured APT repositories. By invoking apt-get update the locally cached APT package list is refreshed. This step can be skipped using the --no-update option. Querying apt-cache pkgnames then results in a full list of currently known, available packages. If any package in the WishList can not be found there, than the request is considered invalid and the program terminates.

Applying apt-cache depends to the now validated WishList enhances it with all transitive dependencies. The such finalized WishList can then be fed to apt-get download to harvest everything needed from the APT repositories. The resulting output is placed into DownDir:

In a last step all temporary files are cleaned up, logging is concluded and the LogFile, if no other path has been specified, is also stored in DownDir.

2.5. Changelog

1.1.0 — 14.01.2024

3. Harvest Maven Repositories

One basic step in application development is to note the modules, on which the project depends on, so that they can be included during the building process. Corresponding non-own module files have usually to be retrieved from remote repositories before they can be employed in the local build process. Thereby it is necessary to consider their whole dependency tree, i.e. also the transitive dependencies.

The Maven Harvester is a tool focusing on transitive module retrieval from Apache Maven repositories [11]. Though initially it's main purpose has been to assist in the provision of dependencies for building APKs with zs-apkbuilder, it can be applied on any Maven packages of interest.

To accomplish its aim Apache Ivy [31] is employed, a tool for a variety of project dependency managemennt tasks. However, zs-mvn-harvester mantles its complexity and provides a simple interface to concentrate on transitive dependency resolution and to serve the results in a ready-to-use way.

3.1. Requirements

3.2. Installation

On Android:

On Debian GNU/Linux and its derivatives:

3.3. Usage

The general use pattern is as follows.

$ zs-mvn-harvester [options] [artifacts]

3.3.1. Options

Providing options is not obligatory. When no options are present then default settings will come into force. So a call of zs-mvn-harvester without any options equals to the following.

$ zs-mvn-harvester -D $PWD -f $PWD/wish.list -R $PWD/repo.list -L $PWD/log.txt

Specifying only a download directory dwndir is equivalent to:

$ zs-apt-harvester -D dwndir -f dwndir/wish.list -R dwndir/repo.list -L dwndir/log.txt

3.3.2. Artifacts

The artifacts which shall be downloaded together with their dependencies have to be identified using the XML namespace format for their Maven coordinates, i.e. groupId:artifactId:version [30]. These can be provided via a file as well as the command line, separated by any kind and number of whitespace characters. If no -f option is present to explicitly specify a file to be read from, the Maven Harvester will look in the download directory for a file named wish.list. Additionally, all artifacts named by command line will be taken into consideration.

In the end, all retrieved artifact files can thereafter be found in the download directory. The accompanying file libs.list names them all, and can e.g. be fed to zs-apkbuilder to determine an APK project's dependencies. Correspondingly the failed retrievals are listed in the file miss.list, supporting the search for errors in the dependency tree and a manual retrieval if need be. The initial artifact wishes are preserved in the file wish.list.

In addition the archive cache.tar.gz stores retrieval data generated by Ivy, so already present artifacts won't be refetched, saving time and internet traffic. In case the archive got deleted the dependency resolution process will have to start from null, and all artifacts will be downloaded anew.

3.3.3. Example

Lets assume one wants to harvest all artifacts needed for the project PdfRendererBasic from the Android Graphics Samples [37], e.g. into the directory ~/PdfRendererBasic/deps. Then one could call:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps androidx.appcompat:appcompat:1.4.2 androidx.arch.core:core-testing:2.1.0 androidx.fragment:fragment-ktx:1.5.1 androidx.lifecycle:lifecycle-livedata-ktx:2.5.1 androidx.lifecycle:lifecycle-viewmodel-ktx:2.5.1 com.google.truth:truth:1.0 org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.7.10 org.jetbrains.kotlinx:kotlinx-coroutines-android:1.6.1

To avoid unnecessary retyping one could create a file ~/PdfRendererBasic/deps/wish.list containing the entries

androidx.appcompat:appcompat:1.4.2
androidx.arch.core:core-testing:2.1.0
androidx.fragment:fragment-ktx:1.5.1
androidx.lifecycle:lifecycle-livedata-ktx:2.5.1
androidx.lifecycle:lifecycle-viewmodel-ktx:2.5.1
com.google.truth:truth:1.0
org.jetbrains.kotlin:kotlin-stdlib-jdk7:1.7.10
org.jetbrains.kotlinx:kotlinx-coroutines-android:1.6.1

Note, that one can actually separate the artifact coordinates with any kind and number of whitespace characters to format the file as one likes. As zs-mvn-harvester would per default look for a file wish.list in the download directory, the corresponding invocation could then be reduced to:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps

When one wants to keep the wishlist separately, e.g. as PdfRendererBasic.list in a directory ~/catalog, one can use the option -f for reference, so:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list

If one has interest in additional packages not enlisted in the given file, but out of some reason does not want to alter it, one can simply add the packages in question on the command line, e.g. here one could add some libraries employed for program testing:

$ zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list androidx.test.espresso:espresso-core:3.4.0 androidx.test.ext:junit:1.1.3 androidx.test.ext:truth:1.4.0 androidx.test:core:1.4.0

Executing that command should yield the result shown in Fig.4.

Fig.4. Final report after zs-mvn-harvester -D ~/PdfRendererBasic/deps -f ~/catalog/PdfRendererBasic.list androidx.test.espresso:espresso-core:3.4.0 androidx.test.ext:junit:1.1.3 androidx.test.ext:truth:1.4.0 androidx.test:core:1.4 has been run to retrieve the dependencies of the PdfRendererBasic project as recorded in the correspondig .list file, plus testing libraries, into the the project's local dependency directory deps. At that #Resolved + #Unresolved equals to modules.number - modules.evicted holds.

Looking afterwards into the download directory e.g. with ls ~/PdfRendererBasic/deps one will see the downloaded artifacts, and further files generated by zs-mvn-harvester.

3.4. Internals

As depicted in Fig.5, three phases have to be passed to fulfil the user's wishes — program setup, wish collection and the actual download of the desired artifacts and all their dependencies.

Fig.5. The 3-phased approach of the Maven Harvester to retrieve packages from Maven repositories.

Setup

Firstly, it has to be ensured that zs-mvn-harvester is operative, i.e. the program's dependencies as well as an internet connection are available. Otherwise the program is terminated. Next the core settings which control the further program behavior are initialized with default values. If one has other needs, one may provide custom values via the Command Line options of zs-mvn-harvester as described in Usage. This way the following parameters will be defined.

Logging starts. After assuring, that the DownDir actually exists, zs-mvn-harvester will check it for the archive cache.tar.gz. If available, it will be unpacked. It contains property files generated by Ivy for each module it has loaded from a repository into this directory in previous runs. By means of these, in the next run Ivy will be able to reconcile the user's wishes with the current content of DownDir and avoid unnecessary re-downloads, as described below.

Get Wishes

As mentioned, one can specify the artifacts one wishes to download via a WishFile as well as CliWishes. The content of both — if existing — are merged into a WishList. If no wishes have been expressed, the program terminates. Otherwise the WishList will be sorted alphabetically, duplicates removed, written to the file wish.list in the download directory, and translated into the Ivy Module Descriptor File ivy.xml.

Get Artifacts

First the remote repositories have to be configured, from which the artifacts shall be retrieved. If a user-defined RepoFile is available, the list of repository URLs will be read from there. Otherwise a default RepoFile will be written into DownDir, pointing to Google's Maven Repository and the Maven Central Repository. This order has been chosen as Android application projects will more frequently need artifacts from the first one. Finally, the repositories are written into the Ivy Settings File ivysettings.xml, together with DownDir to tell Ivy where to put the results.

Now the configuration is complete, consisting of the following files, serving as Ivy's input:

Note, that no files will ever be deleted by zs-mvn-harvester, only new ones added. Also remind, that two artifact instances are considered equal, when having exactly the same Maven coordinates. So, when an artifact is already present with a higher version, but a lower version is now demanded, the request is not yet considered satisfied. The artifact of higher version remains untouched, and the one of lower version is added. An artifact that was not requested for, but is present in cache and/or DownDir is not affected by the harvester.

The output of Ivy is put into DownDir, though with some transformations:

In a last step all temporary files are cleaned up, logging is concluded and the LogFile, if no other path has been specified, is also stored in DownDir.

3.5. Changelog

1.2.0 — 13.03.2024

1.1.0 — 11.03.2024

1.0.2 — 25.02.2024

Footnotes

  1. When zs-mvn-harvester is used as a supplement for zs-apk-builder, then only ~1.5MB are necessary because they overlap in their dependencies.

References

  1. M. Zent (2021), Mobile Development: APK Building on Mobile , timeout.userpage.fu-berlin.de
  2. M. Zent (2021), Mobile Development: II. APK Builder, timeout.userpage.fu-berlin.de
  3. A. Butterfield & G. E. Ngondi (2016), A Dictionary of Computer Science: Oxford Quick Reference, 7th Ed., Oxford University Press
  4. Gradle User Manual, Dependency Management Terminology, docs.gradle.org
  5. Apache Maven Project, Introduction to the Dependency Mechanism, maven.apache.org
  6. R. Kikas, G. Gousios, M. Dumas & D. Pfahl (2017), Structure and Evolution of Package Dependency Networks, Proc. of the 14th Int. Conf. on Mining Software Repositories (MSR '17), IEEE Press, pp.102–112
  7. N. Forsgren (2021), The 2020 State of the OCTOVERSE: Securing the world's software, GitHub
  8. C. Soto-Valero, N. Harrand, M. Martin & B. Baudry (2020), A comprehensive study of bloated dependencies in the Maven ecosystem, Empir. Softw. Eng. 26:45
  9. Station 9 (2023), State of Dependency Management, Endor Labs
  10. Gradle User Manual, Dependency Management Terminology, docs.gradle.org
  11. Apache Maven Project, Maven Repositories, maven.apache.org
  12. MDN Web Docs, Package management basics, developer.mozilla.org
  13. W. D. Bryant (2015), International Conflict and Cyberspace Superiority: Theory and Practice, Routledge, pp.107
  14. International Telecommunication Union (2021), Facts and Figures 2021: 2.9 billion people still offline, itu.int
  15. International Energy Agency (2021), Global population without access to electricity by region, 2000-2021, iea.org
  16. E. Spruyt (2023), African Software Developers: Best Countries for Outsourcing in 2023, tunga.io
  17. P. King (2013), What Was It Like To Be A Programmer Without The Internet?, forbes.com
  18. International Telecommunication Union (2023), Measuring digital development: Facts and Figures, itu.int
  19. M. Zent (2021), Mobile Development: APK Building on Mobile — Implementing the Toolchain for Android, timeout.userpage.fu-berlin.de
  20. B. Byfield (2020), Distro Walk — Debian Derivatives, Linux Magazine 239/2020
  21. Debian Wiki, APT, wiki.debian.org
  22. The Termux Wiki, Package Management, wiki.termux.com
  23. M. Zent (2021), Mobile Development: APK Building on Mobile — Selecting the toolchain components, timeout.userpage.fu-berlin.de
  24. Android Developer Guides, Android NDK — Concepts, developer.android.com
  25. The Termux Wiki, Differences from Linux, wiki.termux.com
  26. Android Developer Guides, Android Studio — Add build dependencies, developer.android.com
  27. Java SE 8 Documentation, Java Archive (JAR) Files, docs.oracle.com
  28. Android Developer Guides, Android Studio — Create an Android library, developer.android.com
  29. Apache Maven Project, Maven Artifacts, maven.apache.org
  30. Apache Maven Project, POM Reference — Maven Coordinates, maven.apache.org
  31. Apache Ant Project, Ivy — The agile dependency manager, ant.apache.org
  32. Debian Wiki, APT, wiki.debian.org
  33. PIP Documentation, pip download, pip.pypa.io
  34. Y. Vo (2017), How to list/download the recursive dependencies of a debian package, stackoverflow.com
  35. DistroWatch database search for distributions based on Debian, distrowatch.com
  36. QEMU Wiki, Documentation/Networking, wiki.qemu.org
  37. Android Graphics Samples, PdfRendererBasic, github.com

Copyright © 2021 - 2024 Michael Zent

This page is intended to be viewed online and may not be printed.