Performance problems are best resolved with a methodical approach. Except in cases of clear regressions and other simple situations, resolving poor performance starts as a system-wide debugging affair. It’s important to be able to narrow scope quickly and definitively.
So while strictly not a specific Node.js practice, the Utilization, Saturation, Error (USE) Method is recommended. The USE Method is a simple strategy you can use to perform a complete a check of system health, identifying common bottlenecks and errors. It can be deployed early in the investigation and quickly identify problem areas, which then can be studied in more detail other methodologies, if need be. The strength of USE is its speed and visibility: by considering all resources, you are unlikely to overlook any issues. It is developed by Joyent Performance Engineer Brendan Gregg.
The checklists below break down common systems into components (CPU, Memory, Network, Storage) and specify what tools are available for, say system-wide CPU utilization as a command-line tool.
mpstat
, kstat
, sm-cpuinfo
, zonememstat
, dladm
, etciostat
, top
, uptime
, vm_stat
, netstat
, iostat
, iosnoop
, etcvmstat
, dstat
, mpstat
, sar
, iostat
, etcDTrace is a dynamic tracing framework with the ability to trace from the kernel all the way through your application’s code. It’s extremely powerful, but most importantly, safe to use in production.
You instrument your application by enabling predefined probes in the kernel
(for instance for syscalls like read()
and write()
), with probes you
defined at compile time for your application, or with probes defined at runtime
for your application. This visibility into your applciations performance is a
crucial tool to effectively practice the USE method.
Node comes with a series of built-in probes to instrument common events throughout your application.
gc-start
, gc-done
net-server-connection
, net-stream-end
, net-socket-read
, net-socket-write
http-client-request
, http-client-response
http-server-request
, http-server-respones
By instrumenting these probes you can gain insight in the patterns of your application by observing the latencies of the common tasks you perform. For an example of using these static probes look at nhttpsnoop, which allows you to see accounting of HTTP requests and responses coming into and out of your application.
By using the dtrace-provider module you are able to define DTrace probes at runtime for your Node application. These probes come with little to no overhead for your application, and allow you to customize very quickly what information you will want to observe about your application while it’s running. See bunyan and restify for examples of how these runtime probes can be leveraged.
You can also use other probes built into the system for more sophisticated
analysis. For example, you can trace malloc
or operator new
and get a
count of allocations by stacktrace.
Cloud Analytics is a tool for real-time performance analysis of production systems and applications deployed in the Triton Cloud. Built on DTrace, Cloud Analytics provides extraordinary visualizations into how your application is performing, as well as being able to correlate that information with your other related services.
Using Cloud Analytics you’re able to observe latencies throughout many processes running in your environments, from filesystem and network I/O through the static probes defined for Node. You’re able to visualize your performance characteristics through traditional graph formats, as well as powerful heatmaps which helps identify outliers instead of only viewing the average latency of your application.
Being able to determine where your application is spending its time is critical to successful operation. Using DTrace, its profile probes, and the ustack helper you can get that information.
# dtrace -n 'profile-97/execname == "node" && arg1/{
@[jstack(150, 8000)] = count(); } tick-60s { exit(0); }' > stacks.out
Will sample your process 97 times a second (you can go as high as 4999 times a second), aggregate the stack traces for all the times your process was sampled, and store that result in a file.
Once you’ve profiled your application with DTrace, you can turn that file into a flame graph using stackvis.
$ npm install -g stackvis
$ stackvis dtrace flamegraph-svg < stacks.out > stacks.svg
Read more about profiling Node.js
Your application may use binary add-on modules for connecting to things like databases, external resources, or integration with existing business logic. Under those circumstances you may have need to quantify the latencies for execution of those interactions. Using just a little bit of DTrace know-how you can see the distribution of latencies for arbitrary methods that queue work in the Node event queue.
Read more about tracing latency in add-ons.
DTrace can be used for more than just data collection, including debugging non-fatal buggy program behavior. DTrace can instrument very specific events on the system (like “the close syscall in a Node process returned -1 with EBADF” or any application-level event), and you can use that to take action like stopping the process to take a core dump and then resuming it again. Putting these pieces together makes it possible to gather all kinds of debugging information from a program that’s behaving badly.
Read more about this technique.
The technique of determining the root cause of a programming error after a process has died is known as postmortem debugging. This is an important technique to practice regularly, as being able to quickly and accurately diagnose the cause and not the symptom of a failure will mean fewer iterations of these procedures. Fewer iterations of debugging will mean faster development cycles and thus having confidence in your application logic sooner.
To ability to successfully diagnose the root cause of a software problem after a crash depends on the amount and quality of information you have at your disposal. Often quality of information will trump quantity. Depending on the design of your application you may want to be able to garner the state of as many inter-related processes at the time of the crash. Then you’ll need to be able to correlate that information into something actionable.
Aside from quality logs, it’s important to be able to capture as much of the state of the process that died as possible. A stack trace is a good start, but is probably not enough information to be able to act quickly to solve a problem. So along with the stack trace of where the process was executing, you want a dump of all the memory that was wired at the time of the process’s death. This is functionality is often provided by the operating system, and is referred to as a core file.
Not all environments enable core file generation by default, so make sure that
when deploying you’ve enabled those features, a simple ulimit -c unlimited
in
the shell where the process will be running can be sufficient. On SmartOS-based
environments you are also able to generate a core file even if the process
hasn’t died, you can do this with gcore.
mdb is the modular debugger, it
can be used to debug running processes or with core files. Core files can be
generated in one of two ways, your process abort()
d, or you used
gcore to generate a core file for a running
process.
Generating core files for a running process is a very powerful technique, doing so allows you to inspect your applications state before something has gone wrong. Also, if you’re trying to debug a memory leak having multiple core files allows you to track the changes in the counts and kinds of memory being kept around.
Your Node process may abort()
for a variety of reasons, from a programming
error inside Node itself, or by programmatically invoking abort directly to
ensure a core is generated. There are two ways you can programmatically induce
your Node process to abort, the first is
process.abort which
you can call at anytime in your code. The other is to run your Node process
with --abort-on-uncaught-exception
which tells V8 that if your process
encounters an uncaught exception it should generate a core file.
mdb has a very powerful module designed specifically for Node, which can be
loaded with ::load v8
. Loading this module allows you to see what JavaScript
objects are found in V8’s heap with ::findjsobjects, see the
stack trace from the entry point through JavaScript frames with
::jsstack, as well as a variety of other commands to inspect the
JavaScript state of your process.
Read more about postmortem debugging.
The ::findjsobjects
command in mdb is a crucial utility for determining the
current state of active objects found in the V8 heap. Running without any
parameters will give you a list of representative object addresses, their
counts, the number of properties, the constructor name, and the shape (the
enumerated order of the properties) for the objects.
af821729 1 997 Object: sxw, mxu, xbd, smf, gdl, cxt, potx, ...
9c31d5d5 1192 1 Array
9c31e6c9 265 7 Module: parent, paths, loaded, exports, ...
9c31bebd 268 7 Array
a470b0ad 1603 2 Arguments: callee, length
9c31bd79 184 40 Array
While that in and of itself is helpful for tracking down memory leaks, you can also iterate over those objects and find who is keeping a reference to them.
> 9c31e6c9::findjsobjects | ::findjsobjects -r
9c31e6c9 referred to by a6e8db0d./opt/marlin/node_modules/restify/lib/request.js
9c31e6c9 referred to by 9c3d8871[1]
b9f45325 referred to by a6e8db0d./opt/marlin/node_modules/manta/node_modules/restify/node_modules/bunyan/lib/bunyan.js
b9f45325 referred to by a6ee4431[0]
b9f45325 referred to by a6efd025.parent
af8fd5a5 referred to by a6e8db0d./opt/marlin/node_modules/restify/lib/response.js
af8fd5a5 referred to by 9c3d8871[2]
Read more about debugging memory leaks.
While operating in mdb on a core file from Node, you can use the ::jsstack
command to determine from the entry point through JavaScript frames where the
process was when the core file was created.
> ::jsstack
8047358 node::DTRACE_HTTP_CLIENT_REQUEST+0x447
8047408 _ZN2v88internalL21Builtin_HandleApiCallENS0_12_GLOBAL__N_116BuiltinArgumentsILNS0_21BuiltinExtraArgumentsE1EEEPNS0_7IsolateE+0x157
8047424 0x9f90a336 internal (Code: 9f90a2c1)
8047444 0xb702c976 <anonymous> (as OutgoingMessage._finish) (a6e38fbd)
8047484 0xb702c80b <anonymous> (as OutgoingMessage.end) (a6e38f99)
80474a4 0x9f90db41 <ArgumentsAdaptorFrame>
80474bc 0xbe6e72d6 <anonymous> (as StringClient.write.options.headers) (a4781f81)
...
8047b18 node::MakeCallback+0x11e
8047b88 node::MakeCallback+0x69
8047c18 node::PipeWrap::AfterConnect+0x158
8047c78 uv__stream_io+0xc9
8047ca8 ev_invoke_pending+0x63
8047ce8 uv__run+0x94
8047cf8 uv_run+0x17
8047d58 node::Start+0x1c7
8047d78 main+0x1b
8047d94 _start+0x83
That can be quite powerful information, but used in combination with
::jsframe
you are also able to inspect the arguments that were passed to the
frame as well.
> 80474bc::jsframe -iv
80474bc 0 <anonymous> (as StringClient.write.options.headers) (a4781f81)
file: /opt/marlin/node_modules/restify/lib/clients/string_client.js
posn: position 2553
arg1: b9f08081 (Oddball: "null")
arg2: a47827f9 (JSObject)
And then with that information you’re able to inspect the arguments directly
with ::jsprint
> a47827f9::jsprint
{
_hasBody: true,
_headerSent: true,
_header: "POST /my/jobs/task/live HTTP/1.1
...
In addition to encountering memory leaks from closures in JavaScript you may
find you have to tackle traditional memory leaks of malloc()
s without
corresponding free()
s. To diagnose these sorts of issues run your process
with UMEM_DEBUG=default
set in the environment. This environment variable
will (among other things) begin storing audit information about allocations
happening in your process, that information includes the stack trace for the
allocation.
When you have a core that was run with UMEM_DEBUG
set, you can then in mdb
use the command ::findleaks
which will give you a list of any leaked memory
aggregated by their callstacks.
> ::findleaks
...
CACHE LEAKED BUFCTL CALLER
08072c90 1 0807dd08 buf_create+0x12
08072c90 1 0807dca0 func_leak+0x12
08072c90 1 0807dbd0 main+0x12
Once you have a list of addresses that were leaked, you can use the BUFCTL
address to see the callstack by using the ::bufctl_audit
command.
Having logs from your application during the time of the failure is a
tremendous help, but it’s important to be judicious with your logging. It isn’t
normally feasible to run in production with DEBUG
or TRACE
levels of
logging being enabled. Even if you could run in production with those levels,
you would have to sift through a lot of noise to be able to find the pertinent
information.
Bunyan is a straightforward logging library
for Node.js applications. Bunyan’s output is line delimited JSON, which makes
it easy to consume with your normal unix command line utilities like grep
and
sed
, and also with its own CLI utility, or with the json cli
utility.
Bunyan also comes with built-in support for DTrace, which allows you to keep
your existing log level for your existing destinations (e.g. INFO
) but enable
at runtime a more verbose level (e.g. TRACE
) and be notified in user space of
those log entries, without them also going to your existing destinations and
potentially filling the disks. DTrace is completely safe to use at runtime, so
if enabling higher levels of logging would result in detrimental effect your
system DTrace will exit before doing your system harm.
That is, you’ve already defined in your configuration the level at which you want your application to log, but your process is currently misbheaving and you’re looking to find out more information without potentially restarting the service or increasing the amount of storage where your logs are stored. With bunyan and DTrace you can enable your process to send to you the log level you’re interested in at runtime.
# bunyan -p *
[2013-09-27T18:16:19.679Z] DEBUG: test/1076 on sharptooth.local: something went wrong