B E A M s c e n e

Life in the Elixir & Erlang ecosystems.

Beam Scene logo

WHO SUPERVISES THE SUPERVISOR?



Post image
  • Published: 2023-04-14
  • Updated: 2023-04-14 15:25:55
  • Author: lgmfred

  • We've all heard of Erlang/OTP or Elixir/OTP and all the praise that comes with it. For those who have never heard of or tried it (which is a shame!), OTP is a set of libraries and tools built on top of Erlang (or Elixir), well-suited for building distributed, fault-tolerant systems that require high levels of concurrency and scalability. But what truly sets OTP apart is its remarkable built-in process management and monitoring capabilities. With this feature, if a server happens to crash, the supervisor jumps in and takes care of the issue by automatically restarting the server. It's like having a personal IT superhero. Ain't it just the way? You got yourself a real humdinger of a system, all complex and full of different services doing their thing. But without any oversight, one little hiccup and bam! The whole darn thing goes belly up. That is, unless you've got Erlang OTP on your side. Those supervisors are like guardian angels, swooping in to save your precious reputation when the system inevitably hits a snag. Thanks to these trusty supervisors, you can sit back and enjoy the sweet, sweet nectar of exceptional uptime that Erlang is known for. To make the most of your supervisors, it's crucial to plan ahead and get your start order and restart strategy on point. While you can't always foresee what will make your processes go haywire, you can take steps to bring them back to life using trusted sources. So instead of blindly trusting a persistent state to be in pristine condition post-crash, go ahead and fetch the building blocks that made up your state in the first place. Think of it like making a cake from scratch, rather than hoping the one you baked last week is still edible. But wait! Who really supervises the supervisor? ================================== Sometimes, recovering from errors is like trying to untangle earphones - frustratingly difficult. For example, tricky errors can lead to a domino effect of worker crashes, forcing the supervisor to keep restarting them. Unfortunately, this won't fix the underlying issue, and the client supervisor will eventually give up and terminate itself. This then affects the top-level supervisor, which will also eventually reach its limit and hit the self-destruct button, taking the entire virtual machine down with it. It's like a game of Jenga gone horribly wrong. When the dust settles and Heart (a monitoring mechanism) realizes the node is down, it will sound the alarm and launch a shell script to try and fix the problem. If it's a minor hiccup, restarting the Erlang VM might do the trick. But if things are really out of whack, rebooting the computer might be necessary. And if that still doesn't work after a few tries, the script might just give up and call for a human hero to step in and save the day. What the heck is the heart? ==================== Well, Erlang has some medical-sounding terminologies as evidenced by dialyzer, amnesia (no, mnesia), and now a heart monitor thingy? The heart is just a program that observes the heart rate of the Erlang runtime system and triggers a script in case there is no acknowledgement of the heartbeat. You can start an Erlang VM with a simple throwaway heart that just prints "restarting node..." when the node terminates. erl -heart -env HEART_COMMAND "echo 'restarting node...'" You brutally terminate the Erlang node by pressing the Ctrl + \ and you'll see the message printed in the terminal. Summary, just for you! ================ When you start an application, the AC (application controller) summons the application master - like a bouncer - to manage the app's setup and communication between the top supervisor and the controller. Think Batman and Robin, but for apps. The bouncer initiates your app's start/2 callback, launching the top supervisor via start_link, which links to the app master. If the top supervisor dies, a permanent app terminates the entire node (possibly restarting via HEART), while a temporary app simply stops running. Think of the application master as the overprotective nanny of your app, but one that's a little bit nuts. It watches over its little ones and their little ones, and when things get out of hand, it goes bananas and wipes out the whole family tree. We Erlangers have a dark sense of humor, so it's not unusual to hear us talking about offing our own children. It's like we're in some twisted fairy tale where the wicked nanny reigns supreme! Yay, we now know who supervises the supervisor! Hell, yes. Pop the champagne because we don't whine about things failing!

Edit | Delete | Back

TAGS:

elixir

erlang


Comment by beam-being on 2023-04-14 20:23:23

Erlangers have the darkest sense of humor, indeed.


Edit
| Delete
Comment by mvkvc on 2023-04-14 20:26:20

Watchmen great movie


Edit
| Delete

Add a Comment

1. The system must be able to handle very large numbers of concurrent activities.

http://www.erlang.org/ http://clojerl.org/ http://joxa.org/

2. Actions must be performed at a certain point in time or within a certain time.

https://github.com/kapok-lang/kapok https://github.com/the-concurrent-schemer/scm

3. Systems may be distributed over several computers.

https://github.com/alpaca-lang/alpaca https://caramel.run/ https://cuneiform-lang.org

4. The system is used to control hardware.

https://wende.github.io/elchemy/ https://github.com/etnt/eml https://github.com/kjnilsson/fez https://github.com/fika-lang/fika

5. The software systems are very large.

https://gleam.run https://github.com/hamler-lang/hamler https://github.com/etnt/Haskerl

6. The system exhibits complex functionality such as, feature interaction.

https://github.com/lenary/idris-erlang https://github.com/chrrasmussen/Idris2-Erlang https://purerl.fun/ https://github.com/rufus-lang/rufus

7. The systems should be in continuous operation for many years.

https://github.com/gfngfn/Sesterl http://efene.org/ http://elixir-lang.org/ https://github.com/bragful/ephp

8. Software maintenance (reconfiguration, etc) should be performed without stopping the system.

https://github.com/joearms/erl2 https://github.com/rvirding/erlog https://github.com/tonyrog/ffe https://github.com/marianoguerra/interfix

9. There are stringent quality, and reliability requirements.

https://github.com/rvirding/luerl https://otpcl.github.io/ http://reia-lang.org/ https://github.com/zotonic/template_compiler https://github.com/extend/xerl

10. Fault tolerance both to hardware failures, and software errors, must be provided.

Bjarne Däcker. Concurrent functional programming for telecommunications: A case study of technology introduction. November 2000. Licentiate Thesis.