An Exercise in new Tech - Identifying Problems and Searching for Solutions - A case study

  • I want to solve a problem, as quickly as possible.

The following post opens up my mind, in all its glorious ignorance and folly - fully uncensored: so you can see the thought process by which a solution is derived.

The Real Problem:

My site presents users with a series of links - files which can be downloaded.

  • Users may want to download 100 files, at once. I do not want them to click on each link separately to download them: that would be far too cumbersome.

  • Some users will have an unreliable internet connection. I do not want their downloads to be restarted, every time there is a connection interruption. Not sure how this can be solved.

    Perhaps AWS has a ready made solution?

    Dropbox insists on having users download a client application: but I know my users and they’re not going to want to download and install a program to download large files, if it can be avoided. The only other alternative I think of is WebAssembly: you have the benefit of a custom client solution, without forcing users to install software. But will this work if the browser somehow closes or if there is a power failure? I don’t know.

    The standards within the browswer are poor, and until browsers catch up to support it, a WebAssembly solution could save a lot of people a lot of headaches. I would love to waltz down this path and investigate a solution for this. Perhaps I could compile an existing C download client, port it to web assembly, and have it just work (TM) if users have an interruption. Wishful thinking. There would likely be problems associated with this.

Possible Solutions

Back end solutions

Download and ZIP

  • Download all the files onto your Heroku server, zip them, and send them to users. This is a quick and dirty solution, but it has its limitations: file size, and time out limits. A 1GB download is not going to work if you only have 500 MB at your disposal. Besides, that might take longer than 30 seconds - the limit imposed by Heroku before timing out your user’s requests.

Stream from Heroku

  • There are libraries out there like Zipline. Which look extremely promising. It solves the above problem (file size limitations, but it creates another one: we will still be streaming from our server); we want to minimise stress on our server. Why not let AWS servers handle that?

Use AWS Lambda

  • Use Lambda to directly stream a series of files into another zip file, stored on an S3 bucket. Send the final file to users. Let them download that file, directly from AWS servers.

Sounds promising, but there are some hidden snags here: I’m not a Lambda expert. Though Lambda fanboys be raving, I could not find Lambda resources, for Rails developers. There might be difficulties in terms of permissions, and simply getting things to work. It’s a new area for me, and new entails risk, cost blowouts and time sinks. We do not have the luxury of waiting: the war might be lost by then!

Front End solutions

How do we present the zip file to users?

The Lambda Solution

  • We could use SSE, or Action Cable. The former is currently not scalable. The latter, could work, but I would prefer the former, since the communication will only be one-way. This affords an opportunity for a PR on Rails itself, but do we have the time for that? We could use Action Cable and that might work.

  • Email: the least friendly user experience, but also the fastest for us to implement. Users will have to log in to their email, in order to download the zip file.

The Streaming Solution

  • Just present the stream to users as soon as it starts. Use the standard send_file method.

The Final Determination

  • The AWS Lambda solution involves too many unknowns: (i) it involves time - in learning new technologies, (ii) I may face unexpected obstacles, and (ii) the fully fledged solution might involve PRs to the Rails code base, if I were to pursue the SSE (Server Side Events course of action). It will likely be more robust than the Zipline solution, but it will take more time (an unknown and indefinite amount) and there is a real risk it could take 3 weeks, instead of 1.

  • We want to deliver something quick. I like the lambda solution, but first, I’ll have to do the Zipline solution. Once that is successful, and if I have time (probably not), I can indulge in more robust solutions. It’s sad for me, because I really want to run with Lambda (the cool new tech), but time and business exigencies are forcing me into a quick solution which solves the problem at hand, despite me not wanting to pursue that for emotional reasons.

The Conclusion

  • I adopted the zipline solution. It took me 1 full day. Most of that day was spent tweaking Elm, for my specific use case, so that I could download files via that Front end framework. It could have taken 2 weeks, if I tried to learn Lambda, and solved every problem which came my way. If I was George McCllelan, I would have prepared myself with a lengthy AWS Course etc. before I took to the field. But I’m well read on Ulysses Grant, and he knew, better than me and better than any man in the world, the critical nature of a job done today vs. in 2 weeks. If I had a bank account filled with VC money that I could spend like no tomorrow: then this would be an instance where that type of gluttony could be indulged. We don’t have that luxury.

  • Outcome: a success. I can move on to edit and add new features, more critical to the survival of my app. The cost is that the solution chosen, is not currently scalable: but that’s ok. I can survive another day. If this app scales to such an extent where that is a problem, I will investigate the Lambda option.

  • We cut down a 4 week time sink, which could extend, into a 1 day operation, that is just good enough. I’ve avoided a big miss: I’ve holed out for par, I’m still in the championship, and I can move on.

Written on November 10, 2020