Thursday, 5 July 2018

macOS - The Dark Side with Mojave

Dark UI has always excited me. It brings a touch of sophistication to the products. For example, Final Cut Pro has a dark interface and so does trading terminals. And finally, with macOS Mojave, it is official that macOS includes a dark mode. And dark mode in Xcode 10 is superb. Playground is getting better for ML works as notebook. Can't wait to the official release of these.

jamf Saga Continues but Little Snitch to the Rescue

In my previous post, I was explaining on jamf and its privacy implications. jamf is very configurable. Since writing to disk is a potential problem, as the user can readily inspect, the alternate technique is to ping the command and control center every time the computer unlocks, wakes from sleep, boots, and such. It is a network call and unless one does outbound traffic monitoring, one does not know about this event happening. And there is nothing better than Little Snitch in my humble opinion to monitor such activities. This is the missing firewall for macOS which gives me peace of mind, knowing that there are no intruders on my local system. The default firewall does wildcard inbound and outbound checks, but the level of granularity is only app based. Little Snitch has got much better granularity (IP, domain, app, time based, profile based, inbound, outbound etc.). The network gateway monitoring is a story on its own, which requires a different technique.

A faster option would be to just set the domain the jamf pings in /etc/hosts to loopback on the local interface, though I am not much sure if jamf will honour that. This is if one does not want to use Little Snitch. Care must be taken because the Little Snitch database logs all the network calls. So any secure access that needs to be not logged has to be removed from its database which can safely be done from the UI. Even though, the database is encrypted and the encryption key is securely stored in the Keychain.

Using ProtonMail with Apple Mail App

ProtonMail has transparent encryption and decryption of emails leaving and entering the local system configured for Apple Mail app, using the ProtonMail Bridge. However the caveat is that only one ProtonMail user profile should be present per account under Profiles. If there are multiple ones, Apple Mail gets confused and is not able to pick one among them, taking the mailbox offline. In case you change account details in ProtonMail website, then use the latest profile and remove the older ones under System Preferences -> Profiles.

Sunday, 24 June 2018

Ambling on a Sunday Evening

It is not that there are very many people that I know who enjoys ambling through ruins. A very strange hobby if I may say. I do enjoy long walks, alone. A rather boring evening that was today when I decided that I would have a walk without any specific route in my mind to start with.

In the mist of evening drizzle, I ambled my way to the park enjoying the drops of sprinkle, the lovely breeze whispering, empty pavements through the horizon, leaves dancing, a breath of fresh air to soothe the mind, bringing colourful memories.

I sit by near the tree, watching the winds make the water dance, the waves playing music. As time pass by, I came to the realisation of certain understanding which was rather misunderstandings, and that some things are better left alone, and no matter what I could say, there is no way to change the past. As the evening pass by, another page in history was written and memories woven.




Dedicated to a very special person, of whom an admirer I am.

Tuesday, 19 June 2018

Design Fundamentals from NID NODE

I had registered for Design Fundamentals course from National Institute of Design's NODE program last year. It took quite some time to actually finish the course as I was caught up in many things. As the title suggests, the course is more on theory aspects of design. The course amount has almost doubled now, nevertheless, it's worth it. Of the many lessons, I think Analytical Drawing is cool and so is Composition. And among the background music, I liked Stop - Ghost K (cheesepuff piano remix) the most. If one is already a design practitioner, this course helps as a refresher.

Monday, 18 June 2018

SimpleMind for Mind Mapping

SimpleMind is an amazing mind mapping software. An example of a mind map created using SimpleMind is given below.



SimpleMind updates their apps for macOS and iOS with new feature and enhancements. iOS app can sync in local networks with macOS app without having to sync with a third-party cloud service. This does not work with Android app however. So I guess they use bonjour protocol.

I have looked into other mind mapping software, but I find this app to be more simple, powerful and robust. The UI is smooth and the mind maps produced are even more nice. The themes, customization are also rich enough that I don't miss anything that a mind mapping software requires.

Mind mapping is a visualisation tool for those who prefer having ideas noted down that way. The mobile app helps to quickly brainstorm, visualise ideas and at the end we can see the whole interconnection of topics, which is lot easier to keep in mind than reams and reams of texts.

The .smmx is a proprietary file format for storing SimpleMind mind maps. We can export in other formats like OPML, but it will lose any rich content information, in case one needs to use another app. But I don't see a need for that anyway.

Conference Summary - Building Data products at Uber

This is my summary of HasGeek Open House conference on Building Data Products at Uber, by Hari Subramanian held on 15th this month.



1. Data size is in petabytes.
2. Results found in staging is not quite the same when using the same model in production due to various factors.
3. For deep learning, tensor flow is used. Results found in AWS and GCP are different.
4. They have build their own BI tools for visualisation.
5. Hive is extended in-house. Hive and Spark overlaps to a certain extend. There are few map-reduce jobs still used which is why Hive is used.
6. Uses own datacenter.

The talks was a high level overview of how Uber uses ML.

Monday, 14 May 2018

Upgraded to HTTPS

The blog has been upgraded to https. Access it using https://www.qlambda.com.
The previous URL http://www.qlambda.com redirects to the above.

Sunday, 13 May 2018

Analysis of Flocking Patterns and Relations

Movement of humans and analysing the movement patterns is an interesting problem. It can be considered to flocking behaviour in species like birds, swarms. Brownian motion can be used to model such collective movements. Representing the raw data from the movement of humans across locations in tabular form or relational model is inefficient. So modelling such data on any cloud platform without a graph based model is limited. An example is BigQuery. Advantage of using BigQuery is that the Google takes care of the infrastructure and massive amount of data streams, and it is NoSQL at the storage layer, but it is still tabular in nature and a relational model is not the most apt way when it comes to modelling location data, movement patterns and making sense of the information for further use in recommendation engines and such.

Looking at various Graph databases at present, OrientDB looks very capable and performant when compared to the well known Neo4j. AllegroGraph is cool if we use RDF and SPARQL or if we use Prolog for reasoning, but it does not have support for Gremlin, which I think Prolog makes up for it, though that would have been a nice addition. The advantage of using an Apache TinkerPop-enabled data system is that we can use any backing datastore like OrientDB, Neo4j, Apache Spark, without having to use the datastore's own DSL. Gremlin graph traversal langauge is to Graph DB, what SQL is in a relational datastore and it makes working with such systems a pleasent experience instead of having to fiddle with DSLs for each different datastores one would encounter.

Tuesday, 8 May 2018

Inspiring are those words - Apple Special Event 2017

The sound of Steve Jobs, thought provoking, inspiring are those words, the opening of Apple Special Event 2017.
There's lots of ways to be as a person.
And some people express their deep appreciation in different ways.
But one of the ways that I believe people express their appreciation to the rest of humanity is to make something wonderful and put out there.
And you never meet the people, you never shake their hands, you never hear their story or tell yours.
But somehow in the act of making something with a great deal of care and love, something's transmitted there.
And it's a way of expressing to the rest of species our deep appreciation.
So we need to be true to who we are and remember what's really important to us.
Captivates mind in ways more than one.

Saturday, 5 May 2018

Data Archival with Optical Disk to withstand EMP

Optical disk are a safe choice if we need any data to withstand an EMP (Electro Magnetic Pulse) attack. And blu-ray disks are cost effective when comparing it with magnetic storage and shields combined. Say for 1 TB external hard disk drive, current approximate price is USD 50. A 5-pack 25GB recordable blu-ray disc costs around USD 15 which is more that 1 TB storage. Even accounting for the initial price of the disc burner and software, blu-rays are cost effective as we archive more data. Now getting rest of the gear to work after an EMP attack is a different matter, however, the data is preserved.

Friday, 4 May 2018

Run LFE on Android with Termux

Termux on Android runs Erlang as described in a previous post. I tried compile, install LFE with Termux, but it failed, probably due to path prefix, after which erl starting crashing as well.

We can however pre-compile it and transfer bin, ebin and include files with the same directory structure to $HOME/bin/lfe. Then add the path to .bashrc.
export LFE_BIN=$HOME/bin/lfe/bin
export PATH=$HOME/bin:$LFE_BIN:$PATH
Set executable permission to the files in lfe/bin with chmod +x lfe*.
Now run lfe which will start the LFE REPL!



Amazing isn't it? We have a full blown Lisp along with the power of Erlang on our fingertips now.

Gorilla-REPL with boot

Gorilla-REPL is analogous to what IPython notebooks are for Python. It provides a web page with markdown and code where one can experiment on various data sets and save the results, use it for code snippets, documentation and such. And it works pretty decently. Gorilla-REPL provide lein plugin integration. Using it with boot can be done sooheon/boot-gorilla library. A sample boot task is shown below.
#!/usr/bin/env boot

(set-env!
    :source-paths #{"src" "test"}
    :resource-paths #{"resources"}
    :dependencies '[[org.clojure/clojure "1.9.0"]
                    [sooheon/boot-gorilla "0.1.1-SNAPSHOT"]])

(require '[sooheon.boot-gorilla :refer [gorilla]])

(deftask grepl []
  (gorilla :port 8002 :ip "127.0.0.1" :block true))
Start the Gorilla-REPL using boot grepl and we will see something similar as below.
Gorilla-REPL: 0.4.1-SNAPSHOT
Started nREPL server on port 52660
Running at http://127.0.0.1:8002/worksheet.html .
Ctrl+C to exit.
<< started Gorilla REPL on http://127.0.0.1:8002 >>
Navigate to http://127.0.0.1:8002/worksheet.html to see the Clojure notebook. Required dependencies can be added to the boot build script, which will be available in the worksheet. Worksheets are saved as Clojure code. We can use gorilla-repl viewer to view these file online. Sample workbook, fibonacci.clj.

Full code added to codebook repo at GitHub.

Thursday, 3 May 2018

Install OneDrive on macOS High Sierra

With a clean install of macOS High Sierra, the default file format is APFS (Apple File System) for SSDs and to make it Unix like generally the choice is the one with case-sensitive, encrypted or otherwise. Since OneDrive expects a partition with case-insensitive format, we cannot setup local sync in that APFS system. And even if we create a case insenstive APFS volume, OneDrive fails to setup sync. The quick and easy way is to create a new partition which is not an APFS volume, and choose exFAT format. Use this as the directory for OneDrive sync and it will work fine.

Access non-static enum in Clojure

Let's say there is a Java class with a non-static enum field as below.
package com.example;

public class Encrypter {

    public enum KeyPlacement {
        PEER,
        INLINE
    }

    private KeyPlacement keyPlacement;

    public KeyPlacement getKeyPlacement() {
        return this.keyPlacement;
    }

    public void setKeyPlacement(final KeyPlacement newKeyPlacement) {
        this.keyPlacement = newKeyPlacement;
    }
}

Setting keyPlacement value from Clojure can be done as follows.
(ns core
  (:import [com.example Encrypter Encrypter$KeyPlacement]))

(doto (Encrypter.)
  (.setKeyPlacement Encrypter$KeyPlacement/INLINE))


When compiling the above source, inner classes, enums etc. gets generated with $ appended as Encrypter$KeyPlacement. So we can import those and access them from Clojure.

Saturday, 14 April 2018

Live reload ClojureScript & JavaScript during Cordova app development

Using ClojureScript for Cordova app development is described in a previous post. This post will expand on how to do live reloading when code changes and also writing JavaScript unit tests with QUnit. The a sample code structure is given below.
example-cordova-app
├── Gruntfile.js
├── config.xml
├── hooks
├── node_modules
├── package-lock.json
├── package.json
├── platforms
│   └── browser
│       ├── browser.json
│       ├── ...
├── plugins
├── res
├── example-cordova-cljs
│   ├── out
│   ├── project.clj
│   ├── resources
│   ├── src
│   │   └── my_app
│   │       └── core.cljs
│   ├── target
│   └── test
│       └── my_app
│           └── core_test.clj
└── www
    ├── css
    │   └── style.css
    ├── img
    ├── index.html
    ├── js
    │   ├── app.js
    │   ├── libs
    │   ├── main.js
    └── test
        ├── qunit.css
        ├── qunit.js
        ├── test.html
        └── tests.js
We can use browser platform to do quick testing during development and cordova-plugin-browsersync to do live reloading of www folder.
1. Install cordova-plugin-browsersync.
cordova plugins add cordova-plugin-browsersync
2. Once the plugin is installed, we can start the watcher from terminal.
cordova run browser -- --live-reload
When ClojureScript changes, it compiles and places the file into www, and the above plugin will detect the change and do a reload. Refresh the browser and the latest code changes will be reflected. This is also useful when we mix JavaScript and ClojureScript.

Unit Testing with QUnit
We can write ClojureScript test, which is a different workflow. Place test scripts under www/test folder. A sample test.html is shown below.
<!doctype html>
<html>
<head>
    <link rel="stylesheet" type="text/css" href="qunit.css">
    <script type="text/javascript" src="qunit.js"></script>
    <script type="text/javascript" src="tests.js"></script>
    <title>Testsuite</title>
</head>
<body>
    <div id="qunit"></div>
    <div id="qunit-fixture"></div>
</body>
</html>
Test scripts can go to test.js.
// test.js
if (document.loaded) {
    test();
} else {
    window.addEventListener('load', test, false);
}

function test() {
    QUnit.module("test");
    QUnit.test("Example", function (assert) {
        assert.ok(true, "ok is for boolean test");
        assert.equal(1, "1", "comparison");
    });
}
We will use grunt to run these tasks. Add both live reload and unit test tasks in Gruntfile.js.
module.exports = function(grunt) {
    grunt.initConfig({
        pkg: grunt.file.readJSON('package.json'),
        qunit: {
            files: ['www/test/**/*.html']
        },
        exec: {
            start: {
                command: 'cordova run browser -- --live-reload'
            }
        }
    });

    grunt.loadNpmTasks('grunt-contrib-qunit');
    grunt.loadNpmTasks('grunt-exec');

    grunt.registerTask('test', ['qunit']);
    grunt.registerTask('start', ['exec:start']);
};
The required dependencies added to package.json follows.
{
    "name": "com.qlambda.example.app",
    // ...
    "main": "main.js",
    "scripts": {
        "start": "cordova run browser -- --live-reload",
        "test": "grunt test"
    },
    "dependencies": {
        "browser-sync": "^2.23.6",
        "cordova-browser": "^5.0.3",
        "cordova-plugin-browsersync": "^1.1.0",
        "cordova-plugin-whitelist": "^1.3.3",
        // ...
    },
    "cordova": {
        "plugins": {
            "cordova-plugin-whitelist": {},
            "cordova-plugin-browsersync": {}
        },
        "platforms": [
            "browser"
        ]
    },
    "devDependencies": {
        "grunt": "^1.0.2",
        "grunt-contrib-qunit": "^2.0.0",
        "grunt-exec": "^3.0.0"
    }
}
The tasks start, test are available as grunt task and we can link main ones to npm as well.
# grunt
grunt start  # start watcher
grunt test   # run testsuite

# npm
npm start
npm test

Sunday, 8 April 2018

macOS Server is a disappointment

macOS Server is very much a disappointment. The main reason for me to use it is because of the quick setup of Calendar, Contact, Notes, DNS and wiki (which I stopped using) which syncs with rest of the Apple devices when within the internal network, without having to install, configure and fiddle with these services separately. With each new update of the server app, Apple keeps removing features, which begs the question, is it going to be discontinued, which very much likely is.


With the latest update 5.6 more services are being removed to prepare for the migration to alternate services. The support article HT208312 lists the following services to be removed in fall 2018.
DHCP, DNS, VPN, Firewall, Mail Server, Calendar, Wiki, Websites, Contacts, Net Boot/Net Install, Messages, Radius, Airport Management
Apple recommends to install those services separately rendering the mac Server app useless (to me).

Thursday, 5 April 2018

Simple cowsay in Clojure

A simple version of cowsay in Clojure.
(ns fortune
  "Fortune fairy."
  (:require [clojure.pprint :as pprn]
            [clojure.string :as str])
  (:import [java.util.concurrent ThreadLocalRandom]))

(def tale ["I'll walk where my own nature would be leading: It vexes me to choose another guide."
           "Every leaf speaks bliss to me, fluttering from the autumn tree."
           "I see heaven's glories shine and faith shines equal."
           "I have to remind myself to breathe -- almost to remind my heart to beat!"
           "I’ve dreamt in my life dreams that have stayed with me ever after, and changed my ideas: they’ve gone through and through me, like wine through water, and altered the colour of my mind."])

(defn gen-random [lb ub]
  (-> (ThreadLocalRandom/current)
      (.nextInt lb ub)))

(defn gen-rand-txt []
  (nth tale (gen-random 0 (count tale))))

(def cowsay-body
"        \\   ^__^
         \\  (oo)\\_______
            (__)\\       )\\/\\
                ||----w |
                ||     ||")

(defn cowsay-hr [width]
  (println (pprn/cl-format nil "+ ~v@<~d~> +" width (apply str (repeat width "-")))))

(defn cowsay-txt-fmt [msg width]
  (println (pprn/cl-format nil "| ~v@<~d~> |" width (str/trim msg))))

(defn cowsay
  ([]
    (let [txt (gen-rand-txt)]
      (cowsay txt 0 21 21 (count txt))))
  ([msg]
    (let [txt (if (empty? msg) (gen-rand-txt) msg)]
      (cowsay txt 0 21 21 (count txt))))
  ([msg width]
    (cowsay msg 0 width width (count msg)))
  ([msg start end width len]
    (when (= start 0) (cowsay-hr width))
    (cond
      (<= end len) (do
                    (cowsay-txt-fmt (subs msg start end) width)
                    (recur msg (+ start width) (+ end width) width len))
      (< start end) (do
                      (cowsay-txt-fmt (subs msg start len) width)
                      (cowsay-hr width)                  
                      (println cowsay-body)))))
The format specifiers in cl-format is very expressive. Since this is a simple version, it just left aligns by characters.
boot.user=> (load-file "fortune.clj")
#'fortune/cowsay

boot.user=> (require '[fortune :as fortune])
nil

boot.user=> (fortune/cowsay)
+ --------------------- +
| I see heaven's glorie |
| s shine and faith shi |
| nes equal.            |
+ --------------------- +
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
nil

boot.user=> (f/cowsay (first f/tale) 0 13 13 (count (first f/tale)))
+ ------------- +
| I'll walk whe |
| re my own nat |
| ure would be  |
| leading: It v |
| exes me to ch |
| oose another  |
| guide.        |
+ ------------- +
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
nil

Boot Task
We can further use this as a boot task. Let's say we placed this in scripts under the project root and we add the below snippet to build.boot.
(def generic-pod (future (pod/make-pod (core/get-env))))

(deftask cowsay
  [m msg VAL str "Message to print"]
  (merge-env! :source-paths #{"scripts"})
  (pod/with-call-in @generic-pod (fortune/cowsay ~msg))
  identity)
This can be run as boot cowsay -m "Every leaf speaks bliss to me, fluttering from the autumn tree.".

Saturday, 31 March 2018

OpenSAML 3 Java - VU#475445 Workaround

As described in the vulnerability note VU#475445, OpenSAML 3 library (including Java library) is vulnerable to the assertions that use C14N canonicalization algorithm. Duo had found this CVE. A quick fix is below.
; ..
(:require [clojure.string :as str])
; ..

(def ^:dynamic *subject-val*)

(defn get-subject-from-node [nodes i c]
  (when (and (< i c) (not (realized? *subject-val*)))
    (let [node (.item nodes i)
          node-name (.getNodeName node)]
      (when (str/includes? node-name "Assertion")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "Subject")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "AuthenticationStatement")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "NameID")  ; SAML v2
        (deliver *subject-val* (.getTextContent node)))
      (when (str/includes? node-name "NameIdentifier")  ; SAML v1
        (deliver *subject-val* (.getTextContent node)))
      (get-subject-from-node nodes (inc i) c)))
  @*subject-val*)
This will extract subject from SAMLResponse in v1 or v2 format.
(defn get-subject [doc-elem assertion]
  (binding [*subject-val* (promise)]
    (let [name-id (.getNameID (.getSubject assertion))
          sub (.getValue name-id)
          resp-node (.getFirstChild (.getParentNode doc-elem))
          resp-node-childs (.getChildNodes resp-node)
          sub-cve (get-subject-from-node resp-node-childs 0 (.getLength resp-node-childs))]
      (if (= sub sub-cve) sub sub-cve))))
Here we proceed with the OpenSAML 3 unmarshalled object and obtained the subject value by calling (.getValue name-id). But since the library is vulnerable, we do a manual extraction of the subject value by calling (get-subject-from-node resp-node-childs 0 (.getLength resp-node-childs)). If they do not match, we return the sub-cve value as get-subject-from-node extracts the subject ignoring any comment added to the subject value.

Example
<saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">2<!-- foo -->1@example.com</saml:NameID></code>
The assertion can be replaced with a subject value as the above. Since C14N ignores comment, even if we modify the XML, to add comments after signing, the validation will ignore the added comments while recomputing the signature.

NB: the subject 2<!-- foo --> should not be in escaped form in the SAMLResponse.

With this, opensaml 3 will give us sub as 1@example.com and the workaround function will give sub-cve as 21@example.com.

Sidenote
; ...
(:import [javax.xml.parsers DocumentBuilderFactory]
         [org.xml.sax SAXException])
; ...

(def disallow-doctype-dec "http://apache.org/xml/features/disallow-doctype-decl")
(def external-parameter-entities "http://xml.org/sax/features/external-parameter-entities")
(def load-external-dtd "http://apache.org/xml/features/nonvalidating/load-external-dtd")

(defn get-doc-builder 
  "Returns a document builder, which should be called for each thread as parse is not thread safe"
  []
  (let [doc-builder-factory (DocumentBuilderFactory/newInstance)]
    (.setNamespaceAware doc-builder-factory true)
    ;; prevent XXE
    (.setFeature doc-builder-factory disallow-doctype-dec true)
    (.setFeature doc-builder-factory external-parameter-entities false)
    (.setFeature doc-builder-factory load-external-dtd false)
    (.setXIncludeAware doc-builder-factory false)
    (.setExpandEntityReferences doc-builder-factory false)
    (.newDocumentBuilder doc-builder-factory)))

(defn parse-xml
  "Parse the given xml which can be a input stream, file, URI."
  [xml]
  (try
    (.parse (get-doc-builder) xml)
    (catch SAXException excep
      (throw (ex-info "XML parse exception." {:cause [:err-xml-parse]})))))
Here we use JAXP parser to parse the XML and use OpenSAML 3 to do the unmarshalling of the parsed XML org.w3c.dom.Document to OpenSAML 3 objects for easy extraction and validation of the SAMLResponse.

Friday, 23 March 2018

Everything is a Monad - Formalising Data Flow Pipeline in Clojure with Category Theory

A basic notion of data flow pipeline is described in a previous post, Data Flow Pipeline in Clojure. Here we will add some formalism to it using Category Theory applied in the context of Clojure. To relate to this, we can think of Haskell. Haskell is a language which is based on typed lambda calculus and it implements the concepts from category theory. Haskell defines its own category named Hask, objects and arrows belonging to it are bounded by laws. Now Clojure is untyped lambda calculus (but not in its strict sense as the reduction model is different, nevertheless a point which differentiates these two languages, plus no one defined any particular category in a way that the s-expressions are bounded by the laws of the category theory, which are not really required, but anyway). Now to the land of Clojure.

1. First we need to define a category, let's say turtle (name inspired from the first programming language I learned, LOGO).

2. A category consists of a collection of entities called objects and a collection of entities called arrows and few other properties (assignments and composition).

3. There are two assignments source (or domain), and target (or codomain) such that each of which attaches an object to an arrow. That is an object in source consumes an arrow and returns an object in the target. This can be represented as
:
                    f
  A -----------------------------------> B
source            arrow               target
domain           morphism            codomain
                   map
4. So to model this, let's use clojure.spec.
(spec/def ::variant vector?)
(spec/def ::tag keyword?)
(spec/def ::msg string?)
(spec/def ::state (spec/map-of keyword? any?))
(spec/def ::turtle-variant (spec/tuple ::tag ::msg ::state))
Here we defined the object to be a ::turtle-variant, the structure of which is [keyword? string? map?]. We need to check that the arrows takes the objects that belongs to the turtle category. Else the morphism cannot belong to the category.
(defn validate-variant 
  "Check if the given element belongs to the category turtle.
   Given a data structure, check if it conforms to a variant. If it does not explain the reason for non-conformance."
  [predicate]
  (let [rule (fn [f] (f ::turtle-variant predicate))]
    (if-not (rule spec/valid?)
      (rule spec/explain)
      true)))
5. Some helper functions.
(defn ok
  "Construct an object of type success in turtle which can passed to a functor. Returns a ::turtle-variant."
  [msg result]
  [:ok msg result])

(defn err
  "Construct an object of type error in turtle which can be passed to a functor. Returns a ::turtle-variant."
  [msg ex]
  [:err msg ex])
6. Now let us create an arrow that the objects in the category can consume and which can return an object. The return object must be in the category. Since the arrow can produce and return object, we can intuitively refer it to as functions or morphism. Here the object in the category is ::turtle-variant and since the morphism produce the same object, we call it endomorphism. It is a mapping of object in a category to itself. We define a functor such that it takes some transformation, but preserves the structure. Below we can see that the fetch takes in a ::turtle-variant and returns a ::turtle-variant.
(defn fetch
  "A functor which performs some operation on the object."
  [varg]
  (let [[tag msg {:keys [url opts] :as state}] varg
        [status body status-code] (http-get url opts)]
    (condp = status
      :ok (ok msg (merge {:res body} state))
      :err (err "Error" {:msg msg :excep body}))))

(fetch (ok "html" {:url "http://example.com" :opts nil})
7. The turtle category is not fully defined yet, because, we need to define partial composition of arrows. Identity can be defined trivially.
Arw x Arw -> Arw
Now, here is where we differ from the most monad libraries out there that tries to implement what Haskell does. Here partial composition does not mean, a function should return a partial. Certain pair of arrows are compatible for composition to form another arrow. Two arrows
:
      f                  g
A ---------> B1   B2 ---------> C
are composable in that order precisely, when B1 and B2 are the same object, and then an arrow
A ---------> C
is formed.

8. If we look at the way we defined the functions and the type constraint on the function, this law holds. Because here all functions takes the same structure and return the same structure, the ::turtle-variant.
At this point, we have defined our category turtle.

9. But from a programmer's perspective, simple having standalone functions are not useful. We need to compose them. Meaning, call a function passing arg if necessary, take the return, call another function with arg of the return of the previous function and such.
(defn endomorph
  "Apply a morphism (transformation) for the given input (domain) returning an output (co-domain) where both domain and
   co-domain belongs to the category turtle. The functors are arrows in the same category. This qualifies as a monad
   because it operates on functors and returns an object. And associative law holds because in which way we compose the functors,
   the result object is ::turtle-variant. This is also a forgetful functor.
   
   Decide whether to continue with the computation or return. All function in the chain should accept and return a variant,
   which is a 3-tuple."
  ([fns]
    (endomorph fns (ok "" {}))) 
  ([fns init-tup]
    (reduce (fn [variant fun]
      {:pre [(validate-variant variant)]
       :post [(if (reduced? %) (validate-variant (deref %)) (validate-variant %))]}
      (let [[tag _ _] variant]
        (condp = tag
          :err (reduced variant)  ;if tag is :err short-circuit and return the variant as the result of the computation
          :ok (fun variant))))
      init-tup
      fns)))
The endomorph is a monoid. A monoid structure is defined as
(R, *, 1)
where R is a set, * is a binary operation on R and 1 is a nominated element of R, where
(r * s) * t = r * (s * t)    1 * r = r = r * 1
In our case, the set R is the set of functions that operates on functors. And we have defined only one, which is the endomorph function. This function operates on functors (which are in itself endofunctors here). The binary operation is the reduce function and identity is trivial. We can use constantly and identity functions defined in Clojure to do that. The associative law hold because, here the order of composition does not matter because, in which ever way we order the composition, the result is the same object ::turtle-variant. That is from a theory perspective. But from a programmer's perspective, we cannot do arbitrary ordering as we need values within the structure. So conceptually it is associative, but the order of computation matters.

10. Now this is also a monad. The classic definition of a monad follows.
All told a monad in X is just a monoid in the category of endofunctors of X, with product * replaced by composition of endofunctors and unit set by the identity endofunctor
-- Saunders Mac Lane in Categories for the Working Mathematician
From the above definition, it is clear that the endomorph is a monad as it operates on endofunctors and itself is a monoid (from above). Now it is also a forgetful functor because some property is discarded once we apply the transformation.

Example usage:
;; Let's say we have many functions similar to fetch, which confirms to the contract defined
;; in the :pre and :post of reduce function of endomorph, we can chain them as below
(endomorph [fetch parse extract transform save] (ok "Init Args" {:url "http://example.com" :xs ".."}))

After all this proof, I came to the realisation that Leibniz was right about everything being a monad.

Monday, 12 February 2018

JSON Logging with MDC using Log4j2 in Clojure

This grew out of necessity and illustrates JSON logging with MDC in Clojure. Also, it is a general understanding that log4j2 async performance is better than any other logging libraries out there at this point in time.

The problem statement is application and the included library logs must output JSON formatted logs which can be directly given to Logstash endpoint. So the output format should be compatible with the defined Elasticsearch format. In my case there are some mandatory fields defined without which Elasticsearch will discard the logs. The approach is simple. Use pattern layout to log as JSON and have additional pattern converter keys defined as suited so that the necessary data object can be marshalled in the logs.
Below is the log4j2.xml.
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="debug" xmlns="http://logging.apache.org/log4j/2.0/config" packages="com.example.logger">
  <Properties>
    <!-- get from env instead -->
    <Property name="application">appName</Property>
    <Property name="app-version">0.0.1</Property>
    <Property name="host">localhost</Property>
    <Property name="env">localhost</Property>
  </Properties>
  <Appenders>
    <Console name="console" target="SYSTEM_OUT">
      <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
    </Console>
    <RollingRandomAccessFile name="plain-log" fileName="logs/app_plain.log" filePattern="app_plain.log.%i" append="false" immediateFlush="true" bufferSize="262144">
        <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg %ex%n"/>
        <Policies>
          <SizeBasedTriggeringPolicy size="1GB"/>
        </Policies>
        <DefaultRolloverStrategy fileIndex="max" min="1" max="100" compressionLevel="3"/>
    </RollingRandomAccessFile>
    <RollingRandomAccessFile name="json-log" fileName="logs/app.log" filePattern="app.log.%i" append="true" immediateFlush="true" bufferSize="262144">
        <PatternLayout pattern='{"@timestamp":"%d{ISO8601}","thread":"%t","level":"%p","logger":"%c","description":"%m %ex","correlation_id":"%mdc{correlationid}","headers_data":%hd,"endpoint":"%mdc{endpoint}","environment":${env},"application":"${application}","application_version":"${app-version}","type":"log","host":"${host}","data_version":2}%n'/>
        <Policies>
          <SizeBasedTriggeringPolicy size="1GB"/>
        </Policies>
        <DefaultRolloverStrategy fileIndex="max" min="1" max="100" compressionLevel="3"/>
    </RollingRandomAccessFile>
  </Appenders>
  <Loggers>
    <Logger name="com.example.core" level="debug" additivity="false">
      <AppenderRef ref="console" level="info"/>
      <AppenderRef ref="json-log"/>
      <AppenderRef ref="plain-log"/>
    </Logger>
    <Root level="info">
      <AppenderRef ref="console"/>
      <AppenderRef ref="json-log"/>
      <AppenderRef ref="plain-log"/>
    </Root>
  </Loggers>
</Configuration>
RollingRandomAccessFile has PatternLayout specified in JSON format with the necessary keys. Here headers_data is a key with a custom converter pattern %hd. This pattern is defined in a class HeadersDataConverter.java as follows.
package com.example.logger;

import org.apache.logging.log4j.core.LogEvent;
import org.apache.logging.log4j.core.config.plugins.Plugin;
import org.apache.logging.log4j.core.pattern.ConverterKeys;
import org.apache.logging.log4j.core.pattern.LogEventPatternConverter;
import org.apache.logging.log4j.util.ReadOnlyStringMap;

import com.example.logger.bean.RequestHeaderData;

/** headers_data converter pattern */
@Plugin(name="HeadersDataConverter", category="Converter")
@ConverterKeys({"hd", "headersData"})
public class HeadersDataConverter extends LogEventPatternConverter {

    protected HeadersDataConverter(String name, String style) {
        super(name, style);
    }

    public static HeadersDataConverter newInstance(String[] options) {
        return new HeadersDataConverter("requestHeader", Thread.currentThread().getName());
    }

    private RequestHeaderData setHeaderData(LogEvent event) {
        ReadOnlyStringMap ctx = event.getContextData();
        RequestHeaderData hd = new RequestHeaderData();

        hd.setAccept(ctx.getValue("accept"));
        hd.setAcceptEncoding(ctx.getValue("accept-encoding"));
        hd.setAcceptLanguage(ctx.getValue("accept-language"));
        // ...
        hd.setxPoweredBy(ctx.getValue("x-powered-by"));
        return hd;
    }

    @Override
    public void format(LogEvent event, StringBuilder toAppendTo) {
        toAppendTo.append(setHeaderData(event));
    }
}
The RequestHeaderData is a Java bean which can be serialized with an overrided toString() method that marshalls object to string using ObjectMapper
package com.example.logger.bean;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.PropertyNamingStrategy;
import com.fasterxml.jackson.databind.annotation.JsonNaming;

import java.io.Serializable;

/** headers_data bean */
@JsonIgnoreProperties(ignoreUnknown = true)
@JsonNaming(PropertyNamingStrategy.SnakeCaseStrategy.class)
public class RequestHeaderData implements Serializable {

    private static final long serialVersionUID = 3559298447657197997L;
    private final ObjectMapper mapper = new ObjectMapper();

    private String accept;
    private String acceptEncoding;
    private String acceptLanguage;
    // ...
    private String xPoweredBy;

    public RequestHeaderData() {}
    
    // Generate getters and setters. Eclipse or any other IDE can do that for us.

    @Override
    public String toString() {
        String str = "";
        try {
            str = mapper.writeValueAsString(this);
        } catch (Exception ex) {}
        return str;
    }
}
SnakeCaseStrategy is the conversion strategy used which automatically converts all camelcase words to underscore ones. Overrides can be specified using @JsonProperty("override_string_here"). That is all there is. Specifying packages="com.example.logger" in the log4j2.xml will allow us to use the HeadersDataConverter plugin registered with hd as one of the pattern converter keys.

Now we have logs in the format
{"@timestamp":"2018-02-08T18:40:07,793","thread":"main","level":"INFO","logger":"com.example.web","description":"Service started. ","correlation_id":"","headers_data":{"accept":null,"accept_encoding":null,"accept_language":null,"cache_control":null,"client_ip":null,"correlationid":null,"connection":null,"content_length":null,"content_type":null,"dnt":null,"host":null,"remote_addr":null,"request_method":null,"path_info":null,"pragma":null,"query_string":null,"true_client_ip":null,"url":null,"upgrade_insecure_requests":null,"user_agent":null,"via":null,"x_forwarded_for":null,"x_forwarded_host":null,"x_forwarded_port":null,"x_forwarded_proto":null,"x_orig_host":null,"x_powered_by":null},"endpoint":"","environment":localhost,"application":"appName","application_version":"0.0.1","type":"log","host":"localhost","data_version":2}

The project.clj should contain the following dependencies.
; ...
; logging
[org.clojure/tools.logging "0.4.0"]
[org.apache.logging.log4j/log4j-core "2.9.0"]
[org.apache.logging.log4j/log4j-api "2.9.0"]
[org.apache.logging.log4j/log4j-slf4j-impl "2.9.0"]
; custom json logging
[com.fasterxml.jackson.core/jackson-core "2.9.2"]
[com.fasterxml.jackson.core/jackson-annotations "2.9.2"]
[com.fasterxml.jackson.core/jackson-databind "2.9.2"]
[org.slf4j/slf4j-api "1.7.24"]
; ....
:source-paths ["src"]
:test-paths ["test"]
:java-source-paths ["src-java"]
:javac-options ["-target" "1.8" "-source" "1.8" "-Xlint:unchecked" "-Xlint:deprecation"]
; ...
tools.logging is the Clojure library which provides macros that delegates logging to the underlying logging library used (log4j2). The slf4j-api is an interface that can work with different logging libraries and most well known libraries implement this. So any third-party library that uses a different logging library like logback will work. But we need a converter log4j-slf4j-impl which will capture all logs that works with SLF4J to be routed to log4j2. And since we defined a custom pattern, it works for all the logs. Simple it is.

The only caveat here is the custom pattern converter requires a well defined class. If the object is not know at compile time, as in if we are logging arbitrary JSON, then it is easier to extend the Layout instead.

ThreadContext (MDC)
ThreadContext is the local data that can be added to a particular thread in log4j2. SLF4j calls this MDC (Message Diagnostic Context). The point is, when a server gets a request which is handled by a thread or handed over to subsequent threads, any logs that happens during the execution of that request be identified with some unique identifier so that we can easily correlate all the logs for that particular request. Further more, if we have multiple services, we can correlate them using a unique correlationId if set. This can be done by setting appropriate values in the thread local context map.

Let's see how to do this with Aleph server in Clojure.
(ns com.example.web
  "Web Layer"
  (:require [aleph.http :as http]
            [manifold.stream :as stream]
            [compojure.core :as compojure :refer [GET POST defroutes]]
            [compojure.response :refer [Renderable]]
            [ring.middleware.params :refer [wrap-params]]
            [ring.middleware.keyword-params :refer [wrap-keyword-params]]
            [clojure.core.async :as async]
            [clojure.java.io :as io]
            [clojure.tools.logging :as log])
  (:import [org.apache.logging.log4j ThreadContext]))

(extend-protocol Renderable
  manifold.deferred.IDeferred
  (render [d _] d))

(defn say-hi [req]
  {:status 200
   :body "hi"})

(defmacro with-thread-context [ctx-coll & body]
  `(do
    (ThreadContext/putAll ~ctx-coll)  ;Set thread context
    ~@body))

(defn wrap-logging-context [handler]
  (fn [request]     
    ;; Set request map and other info in the current thread context
    (ThreadContext/putAll (merge {"endpoint" (:uri request)
                                  "remote-addr" (:remote-addr request)
                                  "query-string" (:query-string request)}
                                  (:headers request)))
    (handler request)))

(defn http-response [response options]
  (ThreadContext/clearAll)  ;Clears thread context
  response)

(defn wrap-http-response
  {:arglists '([handler] [handler options])}
  [handler & [{:as options}]]
  (fn 
    ([request]
      (http-response (handler request) options))
    ([request respond raise]
      (handler request (fn [response] (respond (http-response response options))) raise))))

(defn say-hi-handler [req]
  (let [ctx (ThreadContext/getContext)]  ;Get current thread context
    (stream/take!
      (stream/->
        (async/go
          (let [_ (async/<! (async/timeout 1000))]
            (with-thread-context ctx
              (say-hi req))))))))

(defroutes app-routes
  (POST ["/hi/"] {} say-hi-handler))

(def app
  (-> app-routes
      (wrap-logging-context)
      (wrap-keyword-params)
      (wrap-params)
      (wrap-http-response)))

(defn -main []
  (http/start-server #'app {:port 8080})
  (log/info "Service started."))
Here we wrap the ring handler with wrap-logging-context middleware which will set the request map to the server thread handling the particular request. Since we use aleph async threads for each Compojure routes, we need to pass the context to these threads. For that we get the context ctx in the say-hi-handler and use with-thread-context macro to do the job. That's all there is to logging with thread context.

Sidenote: Getting log4j2 to read the config is a big pile of mess when building a standalone jar because of Clojure, Java interop and compilation nuances. Makes me hate everything in this universe.

Thursday, 18 January 2018

Rapid Prototyping in Clojure with boot-clj

I find boot-clj to be great in doing rapid prototypes. It can be considered analogous to GroovyConsole. We can dynamically add dependencies, write, modify code, run, experiment all within a single file without having to create a project as with default lein setup. Create a folder for experiments and add boot.properties in it.
#https://github.com/boot-clj/boot
BOOT_CLOJURE_NAME=org.clojure/clojure
BOOT_VERSION=2.7.1
BOOT_CLOJURE_VERSION=1.8.0
Then we can create out prototype files, say pilot.clj with the below example template.
#!/usr/bin/env boot

;; To add some repository
(merge-env! :repositories [["clojars" {:url "https://clojars.org/repo/" :snapshots true}]])

(defn deps [new-deps]
  "Add dependencies to the namespace."
  (merge-env! :dependencies new-deps))

;; Add clojure
(deps '[[org.clojure/clojure "1.8.0"]])

;; Require
(require '[clojure.string :as str])
;; Import
(import '[javax.crypto.spec SecretKeySpec])

(println (str/upper-case "hi"))  ;; HI 
For faster startup of the boot-clj, add the following to the shell profile (.zshrc). Tune according to the machine.
# boot-clj faster startup
export BOOT_JVM_OPTIONS='
  -client
  -XX:+TieredCompilation
  -XX:TieredStopAtLevel=1
  -Xmx2g
  -XX:+UseConcMarkSweepGC
  -XX:+CMSClassUnloadingEnabled
  -Xverify:none'