Saturday, 31 March 2018

OpenSAML 3 Java - VU#475445 Workaround

As described in the vulnerability note VU#475445, OpenSAML 3 library (including Java library) is vulnerable to the assertions that use C14N canonicalization algorithm. Duo had found this CVE. A quick fix is below.
; ..
(:require [clojure.string :as str])
; ..

(def ^:dynamic *subject-val*)

(defn get-subject-from-node [nodes i c]
  (when (and (< i c) (not (realized? *subject-val*)))
    (let [node (.item nodes i)
          node-name (.getNodeName node)]
      (when (str/includes? node-name "Assertion")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "Subject")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "AuthenticationStatement")
        (get-subject-from-node (.getChildNodes node) 0 (.getLength (.getChildNodes node))))
      (when (str/includes? node-name "NameID")  ; SAML v2
        (deliver *subject-val* (.getTextContent node)))
      (when (str/includes? node-name "NameIdentifier")  ; SAML v1
        (deliver *subject-val* (.getTextContent node)))
      (get-subject-from-node nodes (inc i) c)))
  @*subject-val*)
This will extract subject from SAMLResponse in v1 or v2 format.
(defn get-subject [doc-elem assertion]
  (binding [*subject-val* (promise)]
    (let [name-id (.getNameID (.getSubject assertion))
          sub (.getValue name-id)
          resp-node (.getFirstChild (.getParentNode doc-elem))
          resp-node-childs (.getChildNodes resp-node)
          sub-cve (get-subject-from-node resp-node-childs 0 (.getLength resp-node-childs))]
      (if (= sub sub-cve) sub sub-cve))))
Here we proceed with the OpenSAML 3 unmarshalled object and obtained the subject value by calling (.getValue name-id). But since the library is vulnerable, we do a manual extraction of the subject value by calling (get-subject-from-node resp-node-childs 0 (.getLength resp-node-childs)). If they do not match, we return the sub-cve value as get-subject-from-node extracts the subject ignoring any comment added to the subject value.

Example
<saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress">2<!-- foo -->1@example.com</saml:NameID></code>
The assertion can be replaced with a subject value as the above. Since C14N ignores comment, even if we modify the XML, to add comments after signing, the validation will ignore the added comments while recomputing the signature.

NB: the subject 2<!-- foo --> should not be in escaped form in the SAMLResponse.

With this, opensaml 3 will give us sub as 1@example.com and the workaround function will give sub-cve as 21@example.com.

Sidenote
; ...
(:import [javax.xml.parsers DocumentBuilderFactory]
         [org.xml.sax SAXException])
; ...

(def disallow-doctype-dec "http://apache.org/xml/features/disallow-doctype-decl")
(def external-parameter-entities "http://xml.org/sax/features/external-parameter-entities")
(def load-external-dtd "http://apache.org/xml/features/nonvalidating/load-external-dtd")

(defn get-doc-builder 
  "Returns a document builder, which should be called for each thread as parse is not thread safe"
  []
  (let [doc-builder-factory (DocumentBuilderFactory/newInstance)]
    (.setNamespaceAware doc-builder-factory true)
    ;; prevent XXE
    (.setFeature doc-builder-factory disallow-doctype-dec true)
    (.setFeature doc-builder-factory external-parameter-entities false)
    (.setFeature doc-builder-factory load-external-dtd false)
    (.setXIncludeAware doc-builder-factory false)
    (.setExpandEntityReferences doc-builder-factory false)
    (.newDocumentBuilder doc-builder-factory)))

(defn parse-xml
  "Parse the given xml which can be a input stream, file, URI."
  [xml]
  (try
    (.parse (get-doc-builder) xml)
    (catch SAXException excep
      (throw (ex-info "XML parse exception." {:cause [:err-xml-parse]})))))
Here we use JAXP parser to parse the XML and use OpenSAML 3 to do the unmarshalling of the parsed XML org.w3c.dom.Document to OpenSAML 3 objects for easy extraction and validation of the SAMLResponse.

Friday, 23 March 2018

Everything is a Monad - Formalising Data Flow Pipeline in Clojure with Category Theory

A basic notion of data flow pipeline is described in a previous post, Data Flow Pipeline in Clojure. Here we will add some formalism to it using Category Theory applied in the context of Clojure. To relate to this, we can think of Haskell. Haskell is a language which is based on typed lambda calculus and it implements the concepts from category theory. Haskell defines its own category named Hask, objects and arrows belonging to it are bounded by laws. Now Clojure is untyped lambda calculus (but not in its strict sense as the reduction model is different, nevertheless a point which differentiates these two languages, plus no one defined any particular category in a way that the s-expressions are bounded by the laws of the category theory, which are not really required, but anyway). Now to the land of Clojure.

1. First we need to define a category, let's say turtle (name inspired from the first programming language I learned, LOGO).

2. A category consists of a collection of entities called objects and a collection of entities called arrows and few other properties (assignments and composition).

3. There are two assignments source (or domain), and target (or codomain) such that each of which attaches an object to an arrow. That is an object in source consumes an arrow and returns an object in the target. This can be represented as
:
                    f
  A -----------------------------------> B
source            arrow               target
domain           morphism            codomain
                   map
4. So to model this, let's use clojure.spec.
(spec/def ::variant vector?)
(spec/def ::tag keyword?)
(spec/def ::msg string?)
(spec/def ::state (spec/map-of keyword? any?))
(spec/def ::turtle-variant (spec/tuple ::tag ::msg ::state))
Here we defined the object to be a ::turtle-variant, the structure of which is [keyword? string? map?]. We need to check that the arrows takes the objects that belongs to the turtle category. Else the morphism cannot belong to the category.
(defn validate-variant 
  "Check if the given element belongs to the category turtle.
   Given a data structure, check if it conforms to a variant. If it does not explain the reason for non-conformance."
  [predicate]
  (let [rule (fn [f] (f ::turtle-variant predicate))]
    (if-not (rule spec/valid?)
      (rule spec/explain)
      true)))
5. Some helper functions.
(defn ok
  "Construct an object of type success in turtle which can passed to a functor. Returns a ::turtle-variant."
  [msg result]
  [:ok msg result])

(defn err
  "Construct an object of type error in turtle which can be passed to a functor. Returns a ::turtle-variant."
  [msg ex]
  [:err msg ex])
6. Now let us create an arrow that the objects in the category can consume and which can return an object. The return object must be in the category. Since the arrow can produce and return object, we can intuitively refer it to as functions or morphism. Here the object in the category is ::turtle-variant and since the morphism produce the same object, we call it endomorphism. It is a mapping of object in a category to itself. We define a functor such that it takes some transformation, but preserves the structure. Below we can see that the fetch takes in a ::turtle-variant and returns a ::turtle-variant.
(defn fetch
  "A functor which performs some operation on the object."
  [varg]
  (let [[tag msg {:keys [url opts] :as state}] varg
        [status body status-code] (http-get url opts)]
    (condp = status
      :ok (ok msg (merge {:res body} state))
      :err (err "Error" {:msg msg :excep body}))))

(fetch (ok "html" {:url "http://example.com" :opts nil})
7. The turtle category is not fully defined yet, because, we need to define partial composition of arrows. Identity can be defined trivially.
Arw x Arw -> Arw
Now, here is where we differ from the most monad libraries out there that tries to implement what Haskell does. Here partial composition does not mean, a function should return a partial. Certain pair of arrows are compatible for composition to form another arrow. Two arrows
:
      f                  g
A ---------> B1   B2 ---------> C
are composable in that order precisely, when B1 and B2 are the same object, and then an arrow
A ---------> C
is formed.

8. If we look at the way we defined the functions and the type constraint on the function, this law holds. Because here all functions takes the same structure and return the same structure, the ::turtle-variant.
At this point, we have defined our category turtle.

9. But from a programmer's perspective, simple having standalone functions are not useful. We need to compose them. Meaning, call a function passing arg if necessary, take the return, call another function with arg of the return of the previous function and such.
(defn endomorph
  "Apply a morphism (transformation) for the given input (domain) returning an output (co-domain) where both domain and
   co-domain belongs to the category turtle. The functors are arrows in the same category. This qualifies as a monad
   because it operates on functors and returns an object. And associative law holds because in which way we compose the functors,
   the result object is ::turtle-variant. This is also a forgetful functor.
   
   Decide whether to continue with the computation or return. All function in the chain should accept and return a variant,
   which is a 3-tuple."
  ([fns]
    (endomorph fns (ok "" {}))) 
  ([fns init-tup]
    (reduce (fn [variant fun]
      {:pre [(validate-variant variant)]
       :post [(if (reduced? %) (validate-variant (deref %)) (validate-variant %))]}
      (let [[tag _ _] variant]
        (condp = tag
          :err (reduced variant)  ;if tag is :err short-circuit and return the variant as the result of the computation
          :ok (fun variant))))
      init-tup
      fns)))
The endomorph is a monoid. A monoid structure is defined as
(R, *, 1)
where R is a set, * is a binary operation on R and 1 is a nominated element of R, where
(r * s) * t = r * (s * t)    1 * r = r = r * 1
In our case, the set R is the set of functions that operates on functors. And we have defined only one, which is the endomorph function. This function operates on functors (which are in itself endofunctors here). The binary operation is the reduce function and identity is trivial. We can use constantly and identity functions defined in Clojure to do that. The associative law hold because, here the order of composition does not matter because, in which ever way we order the composition, the result is the same object ::turtle-variant. That is from a theory perspective. But from a programmer's perspective, we cannot do arbitrary ordering as we need values within the structure. So conceptually it is associative, but the order of computation matters.

10. Now this is also a monad. The classic definition of a monad follows.
All told a monad in X is just a monoid in the category of endofunctors of X, with product * replaced by composition of endofunctors and unit set by the identity endofunctor
-- Saunders Mac Lane in Categories for the Working Mathematician
From the above definition, it is clear that the endomorph is a monad as it operates on endofunctors and itself is a monoid (from above). Now it is also a forgetful functor because some property is discarded once we apply the transformation.

Example usage:
;; Let's say we have many functions similar to fetch, which confirms to the contract defined
;; in the :pre and :post of reduce function of endomorph, we can chain them as below
(endomorph [fetch parse extract transform save] (ok "Init Args" {:url "http://example.com" :xs ".."}))

After all this proof, I came to the realisation that Leibniz was right about everything being a monad.