In node, execute ‘require’ the least amount of times possible

More often than not if someone says don’t do this, I need to try it out for myself. It’s Curiosity, I like my knowledge visceral and first hand.

Like most developers, carrying my C# and Java experience into Node.js I naturally declare my variables as late as possible, often during their assignment, because this adds to the legibility of the code. You can see what it is, what its value is, and you have minimized the scope of the variable, and all of these things are good. Off-course these languages enforce using (c#) and import (java) as the top statements in code units. Node.js does not enforce this, and its require statements don’t just pull types into scope, being a dynamically typed language they instead return a variable. Well, it was not long before I started writing code that contained require statements in the body of functions, and this is not advisable from a performance perspective.

Node is asynchronous and single-threaded (for problem-state code), and we should always favor asynchronous code in Node as a result, but require statements are not asynchronous. Require statements, require code units to be located and pulled into the unit you are executing. This is a lot of IO. Then, they require compilation as well so that the next line of code you execute inherits an environment as you may expect it.

Now it makes sense that this is all cached in the system somewhere, so running over the same ‘require’ statement more than once should not cause too much of an issue, but what is significant and what is not significant? Right, it’s time to test it!

I wrote out a simple test and executed it using Node v6.7.0.

Simple Test

/**
 * Quickly compare the impact of multiple executions of require
 */
let inline = true;
let j = 0;
if (inline) {
  var express = require('express');
  var path = require('path');
  var favicon = require('serve-favicon');
  var logger = require('morgan');
  var cookieParser = require('cookie-parser');
  var bodyParser = require('body-parser');
  
  var index = require('./routes/index');
  var users = require('./routes/users');
}

let start = new Date().getTime();

for (let i = 0; i < 10000; i++) {
  payload();
}

let end = new Date().getTime() - start;
console.log(`Elapsed time=${end}`);


function payload() {
  if (!inline) {
    var express = require('express');
    var path = require('path');
    var favicon = require('serve-favicon');
    var logger = require('morgan');
    var cookieParser = require('cookie-parser');
    var bodyParser = require('body-parser');
  
    var index = require('./routes/index');
    var users = require('./routes/users');
  }
  j = j + 1;
}

The results are rather interesting. For 10,000 executions, the response time cost is a choice between ~500ms or ~2ms.

Results Interpretation:

Whether you set the inline variable true or false, the function payload is called, and if statement is tested and an integer j is incremented. Setting this to value true executed the require functions once, and setting these to false executes them 10,000 times. Now, requiring these modules also requires in the background many more modules.

There are 8 top-level require calls but these call in turn others, 91 others in fact. Let’s assume that the simple act of requiring the top-level modules requires all of the children once. There are ~99 require executions per loop of payload. For simplicity sake, let’s consider this to be 10 top level require calls and 100 individual require calls.

If you are an optimist and want to support placing the require functions within the functions so that these may be executed more than once during the life of the application, you can claim that the overhead is very low per execution – 0.0005 milliseconds per individual require operation or 0.005 milliseconds per top level require operation.

If you are a pessimist, you see a 99.6% difference between the calls, and that within a single invocation of a process that takes just 0.5 seconds to execute.

I am a pessimist – lesson internalized – all require statements will be placed at the top of each unit as far as possible from here on out.

Martin Fowler Saved My FunctionLength Argument

I am a keen follower of Robert C. Martin’s Clean Code practice. I have poured over, re-read and argued each point within it with myself, and lost all the arguments to Uncle Bob’s point of view. However, I have a harder time getting the point on function length across to other developers. On function length, simply following Uncle Bob’s guidelines I ended up automatically coding small functions. This happens automatically when you separate out functions at their level of abstraction, but what does that mean?

Level Of Abstraction

Functions do things, and these things can be classified into different levels of abstraction. This classification is divided into three high-level abstractions –

  1. Flow (if statements and conditional logic)
  2. Iteration (for and while loops and recursive repetition)
  3. Assignment (operations that result in assignments)

A function should strive to either be orchestrating the logic (flow and iteration) or doing the logic (iteration and assignment) but rarely and only in the simplest of cases all three. If your function is doing all three, it is doing ‘more than one thing’ and is exhibiting ‘more than one reason to change’ which breaks SOLID design principles. Furthermore, the separation of code into these functions that accurately and unambiguously describe what the function is doing is the ultimate in code documentation, it is documentation with teeth because developers and maintainers will strive to ensure that the function name adequately describes what it is doing, whereas code comments are not likely to be maintained over time.

Separation Between Intention (Flow) and Implementation (Doing it)

After reading Martin Fowler’s recent article in Function Length I have yet another way of explaining the same mental distiction. And I quote directly from his article:

The argument that makes most sense to me, however, is theseparation between intention and implementation. If you have to spend effort into looking at a fragment of code to figure out what it’s doing, then you should extract it into a function and name the function after that “what”. That way when you read it again, the purpose of the function leaps right out at you, and most of the time you won’t need to care about how the function fulfills its purpose – which is the body of the function.

I read as far as that comment and sprung into action writing this article because I realized I had created a new mental distinction and discovered another one of my Code Heroes (Martin Fowler) doing the exact same thing as another of Code Heroes (Robert C. Martin). Only then did I read further, and discovered that more of my Code Heroes (Kent Beck) do the exact same thing; and that Martin Fowler’s personal experiences exactly mirror my own, and I continue to quote from his article:

Once I accepted this principle, I developed a habit of writing very small functions – typically only a few lines long [2]. Any function more than half-a-dozen lines of code starts to smell to me, and it’s not unusual for me to have functions that are a single line of code [3]. The fact that size isn’t important was brought home to me by an example that Kent Beck showed me from the original Smalltalk system. Smalltalk in those days ran on black-and-white systems. If you wanted to highlight some text or graphics, you would reverse the video. Smalltalk’s graphics class had a method for this called ‘highlight’, whose implementation was just a call to the method ‘reverse’ [4]. The name of the method was longer than its implementation – but that didn’t matter because there was a big distance between the intention of the code and its implementation.

Acknowledgements

Robert C. Martin for so many mental distinctions and Martin Fowler for cementing my resolve to continue on the right path.