Project Outline

Project Planning

All well thought-out projects need some sort of planning. Now, I know, most hackers out there tend to start with hacking, diving into the first upcoming hack-attack and don't give up until the application is somewhat usable. I often am no different. Unfortunately, this too often leaves us with working, but sub-par solutions. We can demonstrate our skills and how fast we make good use of them but, if we do not dare to clean up our mess afterwards, just leave a big mess.

Extreme Programming vs. the V-Model

Major parts of the software industry have worked that way for the past decades. Extreme programming brings vast advantages: creating proof-of-concepts quickly, delivering minimal viable products as fast as possible so your client knows you are doing what they pay you to do is great! Unfortunately clients, and therefor also middle to upper management, tend to have hard times to see reasons to clean up the mess that was generated in the hurry of fast initial work. Why clean something up that obviously works? It is only months, maybe years later, after features upon features have been built onto that initial foundation that code may feel slow. Or finding and fixing bugs is tedious that management might consider a major overhaul of the project.

Then, naturally, questions of pay arise. What might have been some extra hours every now and then during development and maintenance of the project has now become a complete-rewrite-from-scratch, a huge mission that usually lacks hours and motivation to complete. The software that has been written in so many hours has come to a state where keeping it means living with a relatively big pile of unknown risk and neither fixing it nor re-writing it from scratch seems viable. Software then is too often dismissed and replaced for some other solution.

On the other side of the development spectrum there is the V-model. Named after the cascade of requirement definitions (from high to low-level) and - on the opposite side - the acceptance tests for each requirement, the V-model is commonly used in industrial development, where solutions are developed for long periods, have safety requirements that can not rely on do-fast-and-never-clean-up methodology of modern, more agile workflows. In practice this means long, tedious meetings where bleak papers are discussed - for many sheer horror thinking about it.

As complicated as it may sound, the V-Model is a proven workflow to guarantee to create solutions for your requirements but it is vastly more expensive. So - being an individual developer targeting small to medium-sized ventures - I'd suggest something in between. Thinking about requirements, writing them down and (early on) defining acceptance tests is great, remaining flexible with actual solutions too.

Risk

But let's first talk about (potential) risk - and why we should care, at least if we ought to build solutions that are supposed to last. The risk I am talking about comes in one of many forms.

Available Expertise

Time to deliver bug fixes should be (at least somewhat) foreseeable: you want to be able to assure that delivering some fix to some bug will cost something between two minutes and (worst case) a day or a week of an expert worker's time. You also want to be sure to having a team that is capable of finding the bugs in the places they emerge, and not just duct-tape over some code that happens to show a symptom of your bug - otherwise you just willingly accumulate risk until your product finally collapses under its own weight, and - depending on the importance of said product - you lose all of your business.

Supply Chain

You want to be safe against issues emerging from the outside: If a piece of software you have built upon happens to show some security flaws, you want to either be able to replace it or be able to patch it, so it won't affect the quality of your product. This does not just hold for software products, it is equally true for hardware or worker-power: unforeseeable events, like a company going bankrupt, key people quitting (although, people usually go with some sort of grace period) or becoming otherwise unavailable (due to illness or death), natural disasters like fires, earthquakes, high concentrations of cosmic rays that delete all the disks in your server. Asserting risk is about sketching scenarios of failure and preparing mitigation.

How long will it take to train existing staff or hire new experts to fill the gap? Where will you get your data back from, if your main site vanishes after an earthquake? Will you be able to work on your datasets once your proprietary database provider goes bankrupt?

Reproducibility

Another risk factor is reproducibility. To day usually hidden well beneath other, more superficial issues, working on non-reproducible environments is - due to its injection of unknown risk - simply bad practice. One worker quits and you task the next one available with a simple task, but instead on fixing the bug or implementing the feature the new worker first has to re-build a functioning environment through an unnecessarily difficult game of trial-and-error (which is how we currently, usually do it). And - even better - as soon as said worker has implemented the feature, maybe the same game has to be repeated for the CI/CD pipeline, or - even worse - the production environment. Now, I know, most development is eased through the ubiquitous use of docker environments. These are - of course - very cool and leave us with more time to actually develop, while they just work™.

Every hacker who has tried to reproduce a bug that happens in one of the usually three environments (local development, CI/CD, production) but not in the others knows why this is crucial.

What about problems within a docker container? Do we feel capable to reverse-engineer those containers, can we re-create them easily, pin down problems in them and once-and-for-all eliminate them? Are we confident that some outage on, let's say dockerhub (or a similar service) wouldn't invoke problems in our pipelines? Do we mirror all necessary parts on premise?

Manageable Risk

The cruel thing with risk is: mitigation costs, while risk may stay invisible for a long time. But when it hits, it usually hits hard. So what is the right thing to do? Keep risk in mind, build solid foundations for the future, regularly evaluate risk for your projects and don't wait too long to fix outstanding issues. This should boil down to imposing solid practices on your staff, reviewing changes, code and documentation - preferably from workers of opposite sides of the spectra (junior/senior, devops/coder, etc) and keeping track of risk potential.

The Project

What does the software do?

I play in a band. Right now we're still small and one important step to overcome this situation is to promote ourselves. Since we do not have any professionally made videos or other promotional material published, we ought to share material that has either not yet been released or is not fit for publication (for whatever reason) with promoters, venue hosts and other interested people. We do not want to publish any of the material and neither do we want to

a) pay for a (proprietary) service, b) feed one of the big surveillance technology corporations or c) waste more resources (computational, economic, social, etc.) than necessary.

Simply put, we want to:

host files for download,
register email-addresses for each download and
keep simple statistics on the downloads of each file.

Of course, less geeky people would just opt for an already existing solution, but creating this documentary blog is a purpose in and of itself.

User Stories

To illustrate further - and more precisely - what the app should be capable of doing, I listed some user stories. They are of course not exhaustive and leave plenty of space for future improvement.

File Upload

As an administrative user, I want to be able to upload files for sharing, e.g. `music-video.mkv' or `my-band-files.zip'.

Stating Intent

As an administrative user, I want to state at least one intent with each file I want to make available for download, e.g. "summer promotion 2024", "club tour Germany".

Register Email Addresses

As a user receiving a download code, I can state my email address and receive the actual download link through email.

File Download

As a user who stated their email address, I can download the file that is linked in an email.

Activity Insight

As an administrative user, I want to know who downloads what when (and how often).

(Security) Considerations

There are not-so-trivial aspects of this venture that will reason, why some parts will not be built as easily as possible. Why not just set up a static web-server, upload the files and email the links?

Unguessable filenames

Since we do not want to publish any of the files provided for download through the application, it is strictly necessary to prevent people to guess downloads to get access to the files. An obvious way to achieve could be to use hashing for download codes and URLs, coupled with means to hinder users from brute-forcing random-looking addresses.

Ensure Email Validity

Since we ought to keep track on who downloads which file when, we want some means to ensure identity of a user. A simple - but of course, not completely fool-proof - way is to send final download links only through email. This ensures validity of an email address and allows to implement single-use download addresses.

How to accomplish the project

In computing, there are usually infinite different ways to achieve a goal. Programmers, coders and hackers tend to fall back to their own preferences, be it because they are very fast using their best known stack or be it due to the lack of (better) alternatives. Of course, we could hack this project together with a bunch of bash scripts, keeping track of state stored in text-files. Or we could re-invent the wheel by writing the functionality as a module for one of the better known web-servers out there. Or we could opt for the most fancy-looking node module on the market. As mentioned above, early decisions should be considered wisely. So, for this project at least, I will opt to well established free software with a trustworthy history and a community that makes reasonable approaches for mitigating security incidents.

The Components

Boiling the whole idea down to its core we need a few components for project deployment (without re-inventing any wheels):

a host of sorts
which runs a web server,
means to send email and - of course -
the web-application itself.

For development and documentation (e.g. the blog you are reading right now) we need:

an online git repository,
a (reproducible?) development environment and
some means to publish static HTML.