This my basic, no-frills blog. It contains both posts about math and other topics. Lately, they've mostly been about linux and vim. I eventually intend to add in a tag and category based word-cloud so that the blog is easier to navigate.

504 Final Exam Notes

23 May 2018

I didn’t understand some things about stopping times and the debut theorem when I taught this class, so I’ve added some extra notes here. Next time, I’ll certainly build some more general ideas about stochastic processes like joint and progressive measurability after constructing Brownian motion. I think this might be more helpful than doing fiddly little things about the reflection principle and Hausdorff dimension.

Carl Mueller showed me Greg Lowther’s notes about stochastic processes. They are very well written. I used them quite heavily for problem 11 and problem 12.

Poincare recurrence (easy warmup). We have the usual setup $(\W,\mathcal{F},\Prob,T)$ with $T$ measure preserving and ergodic. Prove
\[\lim_{n \to \infty} \frac1 n \# \left( k \leq n : T^k x \in Q \right) = \Prob(Q)\]
Then, if $R$ is the return time to $Q$, prove that (you may repeat the proof in class) $\Prob(R < \infty) = 1.$

Solution: This was essentially the ergodic theorem, and all of you got this right.
An extreme point of a convex set cannot be written as a nontrivial convex combination of two other points in the set. Let $\mathcal{I}$ be the invariant $\sigma$-algebra. Let $\mathcal{M}$ be the space of all $T$-invariant probability measures.
1. Show that a probability measure is ergodic iff it is an extreme point of $\mathcal{M}$.
  
  Solution: An extreme point of a convex set cannot be written as a nontrivial convex combination of two other points of that set. If $P$ is not ergodic, then there is a nontrivial invariant set $A$ for $\Prob$. You found $\mu$ and $\nu$ by writing $\mu(B) = P(B | A)$ and $\nu(B) = P(B | A^c)$. Therefore, $P$ cannot be an extreme point. All of you got this part of the argument right.
  
  Most of you did not get the second part right. If $P$ is not an extreme point of $\mathcal{M}$, then it can be written as a combination of $\nu$ and $\mu$, where $\nu$ and $\mu$ are both extreme points. We claim that $P$ cannot be ergodic.
  
  The only thing we can use at this point is the ergodic theorem, so lets keep that in mind as we proceed. If $\nu$ and $\mu$ are different measures, there must be a bounded function (an indicator of a set, e.g.) such that
  \[\int f d \nu \neq \int f d \mu\]
  So the pointwise ergodic theorem shows that there is a $\nu$-null set $B$ such that the ergodic average $S_n f \to \int f d \nu$ on $\W \setminus B$, and further, $B$ is invariant. Similarly, $S_n f \to \int f d \mu$ on $\W \setminus B’$. But we ought to have $\W \setminus B’ \subset B$, which means we must have $\mu(B) = 1$. Thus, we have found an invariant set $B$ such that
  \[P(B) = p \nu(B) + (1-p) \mu(B) < 1\]
2. Assume that $\W$ is “nice”. Then, from your previous semester’s probability class, any measure $\Prob$ has a regular conditional probability with respect to the invariant $\sigma$-algebra such that
  \[\Prob(\w,A) = \E[ 1_A | \mathcal{I} ]\]
  Show that the regular conditional probability is ergodic for the translation map $T$.
  
  Solution: The niceness of the space lets you assume that a regular conditional probability exists. Our version of niceness always included a notion that said that our $\sigma$-algebra was countably generated. So it’s enough to check anything we have on a countable collection of generating sets.
  
  Most of you got this part right. Essentially, for any invariant set $E$, you have to check
  \[\Prob(\w,E) = \E[ 1_E | \mathcal{I} ] = 1_E \in \{0,1\}\]
  Since $E$ is $\mathcal{I}$ invariant, the conditional expectation leaves the indicator unchanges. This may be proved by testing on the countable generating set.
3. Let $\mathcal{M}_e$ be the set of all ergodic invariant measures. Then, show that for any invariant measure $P$, there is a probability measure $\mu_P$ on the set $\mathcal{M}_e$ such that
  \[P(A) = \int_{\mathcal{M}_e} Q(A) \mu_P(dQ)\]
  Hint: For any measurable set $A$, prove that
  \[P(A) = \int_{\W} P(\w,A) dP(\w)\]
  Remark: This is an important fact. Any invariant probability measure can be decomposed as a convex combination of its ergodic components using the regular conditional probability with respect to the invariant $\sigma$-algebra.
  
  Solution: All of you checked the tower property.
  
  People tend to use Choquet’s theorem or Krein-Milman for the last two parts instead of using the conditional probability. See wikipedia or this pdf. In one line, the space of invariant measures is a topological vector space; restricting to probability measures makes it compact. Hence Choquet’s theorem applies.
Recall the definition of unique ergodicity and generic points from your notes (or mine, on this website). It says that the following are equivalent.
1. $(\W,T)$ is uniquely ergodic.
2. Every $\w \in \W$ is generic.
Recall that generic points are points for which the ergodic theorem holds for all continuous functions on $\W$ (it comes with a topology). The system is uniquely ergodic if there is only one measure $\mu$ that is invariant under $T$.

Prove that Haar measure is uniquely ergodic for the circle rotation $T_\theta \colon [0,1) \mapsto [0,1)$ where $T_\theta(\w) = \w + \theta \mod 1$. Hint: Look in Durrett.

Solution: Durrett shows that the ergodic theorem holds for indicators of half-open intervals in Theorem 7.2.4 (version 4.1 on his website).
Write down the probability that
\[\Prob_0((B(t_1),\ldots,B(t_n)) \in A_1 \times \cdots \times A_n)\]
for Brownian motion in $\R^d$. Here, $A_i$ are measurable sets in $\R^d$ and $0 < t_1 < \cdots < t_n$.

Yes, this problem is that easy.

Solution
Next, we —by we I mean I asked Carl for a reference, and he randomly picked McKean’s book. There’s no escaping him.— construct Brownian motion on $[0,1]$ using the Haar basis. This proof is due to Paul Lévy, and was later simplified by Ciesielski. Consider the Haar basis, defined by $f_0 = 1_{[0,1]}$ and
\[f_{k2^{-n}} = \begin{cases} + 2^{(n-1)/2} & (k-1)2^{-n} \leq t < 2^{-n} \\ - 2^{(n-1)/2} & k 2^{-n} \leq t < (k+1) 2^{-n} \\ 0 & \text{otherwise} \\ \end{cases}\]
for odd $k < 2^n$. Show that ${f_{k2^{-n}} }_{k,n}$ form an orthornormal basis for $L^2[0,1]$. Note that the integral of this basis is the Schauder basis
\[\int_0^t f_{k 2^{-n}}(s)\,ds\]
that forms tent-shaped functions (Draw pictures for yourself)

Levy’s idea is to use the formal Haar series to define Brownian motion. Let $g_{k 2^{-n}}$ be an independent Gaussian family of mean-$0$ random variables with variance $1$. Define
\[b(t) = g_0 \int_0^t f_{0} + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} g_{k 2^{-n}} \int_0^t f_{k 2^{-n}},\]
a random Fourier series.

There are two things to check:
1. The series for $b(t)$ is uniformly convergent on $[0,1]$. Hence, $b(t)$ is continuous.
2. The process $b(t)$ has the right correlations
  \[\E[ b(t) b(s) ] = t \wedge s\]
For $1.$, let
\[e_n = \Norm{\sum_{k \text{ odd} < 2^n} g_{k 2^{-n}} \int_0^t f_{k 2^{-n}}}{\infty} = 2^{-(n+1)/2} \max_{k \text{ odd } < 2^n} |g_{k 2^{-n}}|\]
You will have to show the second equality above. Then, estimate
\[\Prob( e_n > \theta \sqrt{(2^{-n} \log 2^n)})\]
and then use the Borel-Cantelli lemma.

For the correlations in $2.$, use Parseval’s relations for the Haar basis applied to the indicator functions $j_1$ and $j_2$ of the intervals $[0,t]$ and $[0,t]$. (This part is cute).

Solution: All of you got this problem right, except for the correlation part. There is a bit of work to show that the basis is complete in $L^2$, but that’s quite easy to see. For example, SQ and DS showed that indicators of dyadic intervals can be written as a (limit of) a linear combination of the Haar basis.
\[\begin{align} \E[ b(t) b(s) ] & = \int_0^1 1_{[0,t]} f_{0} \int_0^1 1_{[0,s]} f_0 + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} \int_0^1 1_{[0,s]} f_{k 2^{-n}} \int_0^1 1_{[0,t]} f_{k 2^{-n}}\\ & = \left( 1_{[0,t]} , f_{0} \right) (1_{[0,s]}, f_0) + \sum_{n=1}^\infty \sum_{k \text{ odd} < 2^n} ( 1_{[0,t]}, f_{k 2^{-n}} ) ( 1_{[0,s]}, f_{k 2^{-n}} ) \\ & = (1_{[0,t]}, 1_{[0,s]}) \end{align}\]
where the last identity follows from Parseval’s relationship for the Haar basis. Parseval’s relationship is fairly straightforward to prove for yourself.
Exercise 8.1.2 from Durrett. Show that Brownian motion is not Hölder 1/2 + 1/k.
Exercise 8.2.3 from Durrett. The set of local maxima of Brownian motion is an almost surely a dense set.

Solution: Show it first for some rational interval. Given some $(a,b)$, there is a point in the interior of $(a,b)$ where the local maximum is taken since there is a sequence of times $t_k \to a$ such that $B_{t_k} = B_a$. Recall that this was a consequence of Blumenthal’s 0-1 law. Then, use the Markov property.
Let $\W_0 = \left\{ f | f \colon [0,\infty) \to \R \right\}$, and let $\mathcal{F}_0$ be the $\sigma$-algebra generated by finite dimensional sets of the form ${ f \colon f(t_i) \in A_i, t_i \in [0,\infty), A_i \subset \R, 1 \leq i \leq n }$. Let $\mathcal{C} = \{ f \colon t \to f(t) \text{ is continuous} \}$. Show that $\mathcal{C} \not \in \mathcal{F}_0$.

Solution

Both NC and DS noticed the following: show that the $\pi$-system containing sets of the form (for a fixed sequence ${ t_k }$ and measurable set $E$ in the product space $\mathcal{B}(\R^{\Z^+})$
\[A = \left{ \w \colon \exists \{ t_k \} \text{ and } \E \in \mathcal{B}(\R^{\Z^+} ( B(t_1),B(t_2), \ldots ) \in E \right}\]
is also a $\lambda$-system and is hence a $\sigma$-algebra. So any $\sigma$-algebra containing cylinders should contain this set.

To complete the argument, you do the following: suppose continous functions were measurable, then there is a sequence ${t_k}$ and a set $E$ such that $\mathcal{C}$ is in the form above. However, consider the indicator of the point $c \in \R^+$, $1_{c}(x)$, which is not continuous. Therefore, some $t_k = c$, otherwise $1_c$ will belong to $\mathcal{C}$. But there are uncountably many $c$ and only a countable number of $t_k$.
Suppose $f(t) > 0 \forall t > 0$. Show that for some $c \in [0,\infty]$ we have
\[\overline{\lim_{t \to 0}} \frac{B(t)}{f(t)} = c \quad \Prob_0 \, \almostsurely\]
Solution This was Blumenthal. I liked SQ’s solution which also used the law of the iterated logarithm for Brownian motion here.
Suppose
\[\begin{align} \mathcal{N}_x & = \left\{ A \colon A \subset D \, \Prob_x(D) = 0 \right\} \\ \mathcal{F}_s^x & = \sigma \left( \mathcal{F}^+_s \cup \mathcal{N}_x \right) \\ \mathcal{F}_s & = \cap_x \mathcal{F}^x_s \end{align}\]
Show that $\mathcal{F}$ is a right-continuous $\sigma$-algebra.

Solution: The key here, is to notice (like NC did) that since $\sigma \left( \mathcal{F}^+_s \cup \mathcal{N}_x \right)$ is a completion of a measure space, all sets in the $\sigma$-algebra can be written in the form
\[A \cup B\]
where $A \in \mathcal{F}^+_s \AND B \in \mathcal{N}_x$.
If $G$ is open let $T = \inf \left\{ t \geq 0 \colon B_t \in G \right\}$, where $B_t$ is a d-dimensional Brownian motion. Show that $T$ is a stopping time. Repeat the problem with $K$ closed instead of $G$.

Do you need continuity of the Brownian motion here?

Can you find a discontinuous Markov process for which $T$ is not a stopping time?

Solution: This is a problem that I didn’t fully understand when I gave it, but while solving it myself, I learned about the full debut theorem. I’ll give you a short explanation of it. All of these concepts are better explained in Greg Lowther’s notes.

There are three ways to describe stochastic processes.
1. As a collection of random variables ${X_t}_{t \geq 0}$, one for each $t$,
2. As a path $t \to X_t(\w)$, one for each $\w$,
3. And as a function from the product space $\R^+ \times \Omega \to \R^d$.
The third is a useful way to think about things, especially when you want to talk about stopping times and such.

Indistinguishability: Two processes $X_t$ and $Y_t$ are indistinguishable if
\[\Prob(X_t = Y_t \forall t \geq 0 ) = 1\]
This is also called equivalent up to evanescence.

However, as we did in class, processes are usually described by their finite-dimensional distributions. So we need another weaker notion of equivalence.

Stochastically equivalent: Two processes are stochastically equivalent if they have the same finite-dimensional distributions. A simpler way to say this is to say for each time $t \geq 0$,
\[\Prob( X_t = Y_t) = 1.\]
As we have seen before, stochastic equivalence does not imply indistinguishability. BUT, the following lemma is quite relates stochastic equivalence and indistinguishability when you have some extra continuity.

Lemma: All right-continuous processes and left-continuous processes that are stochastically equivalent are indistinguishable.

The next important concept is joint-measurability.

Joint-measurability: A process is jointly measurable if it is measurable with respect to the product $\sigma$-algebra $\mathcal{B}(\R^+) \times \mathcal{F}$.

Note the Borel $\sigma$-algebra is on the whole positive real line. Also notice that we’re not using the fact $\mathcal{F}_t$ is a filtration for the process $X_t$. There is a related notion called progressive measurability, that uses the fact that $\mathcal{F}_t$ is a filtration. I’ll touch upon that in a minute.

Lemma: All right-continuous and left-continuous processes are jointly measurable.

Suppose $\tau$ is any random time, $\tau \colon \Omega \to \R$. Note that $\tau$ is not a stopping time; like joint measurability, it’s not using the fact that $\mathcal{F}_t$ is a filtration.

Lemma: If $X_t$ is a jointly measurable process, then $X_\tau$ is a measurable random variable.

With what we have so far, let us state a baby version of the Debut Theorem. We assume that the filtration is right-continuous.

Theorem: Let $X$ be an adapted right-continuous stochastic process taking values in $\R$ that is defined on a complete filtered probability space. If $K$ is any real number, let
\[\tau(\w) = \inf \left\{ t \in \R^+ \colon X_t(\w) \geq K \right\}\]
Then $\tau(\w)$ is a stopping time. This is proved in this post about stopping times and the debut theorem.

I think the proof extends quite easily to any closed set $K \in \R^d$. However, this is not the most general result possible if we’re allowed to assume that the process (like Brownian motion) is progressively measurable. This general version of the debut theorem does not assume anything about the continuity of the process either.

Next, we talk a little bit about measurable selection. This will lead into the generalized debut theorem, which talks about entrance times into measurable sets.

I’ll ignore some of the material from the post titled Filtrations and Adapted Processes, but I highly recommend reading it. The one definition we need from it is the following.

Progressive measurability: A process is called progressive measurable if for each fixed $t \geq 0$, the map $X_s(\w) \colon [0,t] \times \W \to \R^d$ is $\mathcal{B}([0,t]) \times \mathcal{F}$ measurable.

Let $\pi_B$ be the projection operator from sets $A \times B$ onto $B$. That is $\pi_B((a,b)) = b$.

Theorem (Measurable Projection): If $(\OFP)$ is a complete probability space and $A \in \mathcal{B}(\R) \times \mathcal{F}$, then $\pi_\W(A) \in \mathcal{F}$.

This seems obvious, but it’s not. It’s also important for the space to be complete here.

There is an interesting mistake related to this that Lebesgue made. The mistake was discovered by Suskin, which eventually led into the study of descriptive set theory.

Definition (Graph): The graph of a random time is
\[[\tau] = \left\{ (t,\w) \in \R^+ \times \W \colon t = \tau(\w) \right\}\]
Measurable projection allows us to classify stopping times by graph measurability.

Lemma: Let $\tau \colon \W \to \R^+ \cup \infty$ be any map. Then
- $\tau$ is measurable iff $[\tau]$ is jointly measurable.
- $\tau$ is a stopping time iff $[\tau]$ is progressive.
The second line could use some explanation: it means that
\[1_{[\tau]}(t,\w)\]
is a progressively measurable process.

The debut of a set $A \subset \R^+ \times \W$ is a map $D_A \colon \W \to \R^+$ defined by
\[D_A(\w) = \inf \left\{ t \in \R^+ \colon (t,w) \in A \right\}\]
That is, it’s the first time $t \geq 0$ when the process enters the set $A$.

Theorem (Debut): If $A \subset \R^+ \times \W$ is progressively measurable and the filtration is right-continuous, then $D_A$ is a stopping time.

Corollary: If $X$ is a progressively measurable process, $K \subset \R^d$ is Borel, and $\mathcal{F}$ is right continuous, then
\[\tau = \inf \{ t \in \R^+ \colon X_t \in K \}\]
is a stopping time.

Comments: Since it’s clear that breaking continuity is not going it work so easily to beat the debut theorem, we should try to break something else. One way to make something not be a stopping time is to break the right-continuous filtration property. DS did the following: let $X$ be a Bernoulli(1/2) random variable, and define the process $A(t)$ as follows
\[A(t) = \begin{cases} 0 & t = 0 \\ X & t > 0 \end{cases}\]
Notice that the debut time $T$ of the set $(1/2,\infty)$ is not a stopping time since $T \leq 0$ is not $\mathcal{F}_0$ measurable. The property that DS broke is the following: $\mathcal{F}_0 \neq \cap_{t > 0} \mathcal{F}_t = \cap_{t > 0} \sigma(X)$.
Give an example of a process that is Markov but not strong Markov.

There are several simple examples. Here is one. Let $X(t)$ be the process
\[X(t) = \begin{cases} B(t) & X(0) = x \\ 0 & X(0) = 0 \\ \end{cases}\]
where $B(t)$ is a standard Brownian motion.

We may check that it is Markov. For this, one has to check the following property.
\[\E_x [ f(X_{t+s}) | \mathcal{F}_t ] = \E_{X_t} [ f(X_t) ]\]
The left hand side is a conditional expectation. For $x=0$, this is easy to verify of course. So pick a set $B \in \mathcal{F}_t$ and let $x \neq 0$. Then, since $X_t$ is a Brownian motion, one only needs to note that $\Prob(X_t = 0) = 0$ for any fixed $t$, and therefore
\[\begin{align} \E_x[ 1_B \E_x [ f(X_{t+s}) | \mathcal{F}_t ] ] & = \E_x[ 1_{B \cap \{ X_t \neq 0 \} } \E_x [ f(X_{t+s}) | \mathcal{F}_t ] ] \\ & = \E_x[ 1_{B \cap \{ X_t \neq 0 \} } \E_{X_t} [ f(X_{t+s}) ] ] \end{align}\]
where in the last step we just used the Markov property for Brownian motion.

I’ll also sketch Ito’s example to you, which is also quite simple. This is from the series of lectures he gave at the Tata Institute for Fundamental Research, Bombay in 1961. He uses a deterministic process
\[\xi^{(a)}_t = \begin{cases} a + t & a \neq 0 \\ 0 & a = 0, t < \tau(\w) \\ t - \tau(\w) & a = 0, t \geq \tau(\w) \end{cases}\]
That is, the process just increases linearly from its starting point $a$, except when it starts at the origin $a=0$. In the latter case, it waits for an exponential waiting time $\tau(\w)$ and then starts increasing linearly.

Again, it’s easy to verify the Markov property. For the strong Markov property, Ito uses the hitting time $\sigma$ of the set $(0,\infty)$, and considers
\[\Prob_0( \sigma(\w) > 0, \sigma( \theta_\sigma \w) > 0)\]
where, like Durrett, I’ve used $\theta_t$ for the time-shift operator. It’s easy to show that the strong Markov property does not hold for this probability.

There are processes called the strong Feller processes that always have the strong Markov property. It’s worth visiting their definition to see how we’re breaking the strong Markov property. In the proof of the Strong Markov property, Durrett uses the continuity of the function
\[\phi(s,x) = \E_x[ f(X_s) ]\]
where $f$ is some bounded continuous function. He then uses
\[\lim_{(s,x) \to (0,y)} \phi(s,x) = \phi(0,y)\]
This was easily shown to be true for Brownian motion, and essentially followed from the continuity of the process.

the Feller property is usually defined using the semigroup (in time) $P_t$ acting on the space of continuous functions $C(\R^d)$:
\[P_t f(x) = \E_x [ f(X_t) ]\]
where $X_t$ represents the Markov Process at time $t$. Of course, we just called this $\phi(t,x) = P_t f(x)$. A Feller semigroup on $C(\R^d)$ satisfies:
- $P_t f$ is a map from $C(\R^d)$ to $C(\R^d)$.
- $P_t$ forms a semigroup; i.e., $P_{t + s} = P_t P_s$
- $\Norm{P_t f}{} = \Norm{f}{}$ for all $t \geq 0$.
- $\lim_{t \to 0} \Norm{P_t f - f}{} = 0$
One has to build some theory to show that Feller semigroups define Markov processes, and then show that the process has the strong Markov property.

In our examples, we break the first property. $P_t f(x)$ does not turn out to be continuous function for all continuous $f$. In our case, we get $P_t f(0) = f(0)$ for all bounded functions $f$. We then choose a function such that $\lim_{x \to 0} P_t f(x) \neq f(0)$.

More questions for the future: Give me a process that is adapted but not jointly measurable. Give me a process that is jointly measurable but not progressively measurable.

Debian encrypted root partition, systemd and crypttab

28 Apr 2018

I have a simple server with encrypted disks running debian. I had trouble setting up a fully encrypted system, so I thought I’d share the details of my setup.

/dev/sda1 SSD: unencrypted boot partition
/dev/sda2 SSD: encrypted lvm partition containing system root partition. This is a luks partition.
/dev/sdb  conventional 3TB encrypted hard disk with zfs mirrored raid
/dev/sdc  conventional 3TB encrypted hard disk with zfs mirrored raid
/dev/<more-raid-disks>

When booting, grub loads the kernel in the boot partition, and the initramfs prompts for the decryption passphrase. Once I enter it, the initramfs scripts mount an lvm partition containing my encrypted root partition.

Then, systemd prompts for the passphrase again several times for each entry in my crypttab. This is a pain in the butt, although I don’t really have to reboot very often.

There is a simple solution to this: you create a second luks key that decrypts the raid array, and store it somewhere on the root partition (like /etc/luks-key1). Then you make a crypttab file like so:

# <target name>	<source device>		<key file>	<options>
cryptroot UUID=<UUID>   	        none            luks
cryptzfs1  /dev/disk/by-id/<DEVICEID>   /etc/luks-key1  luks
cryptzfs2  /dev/disk/by-id/<DEVICEID>   /etc/luks-key1  luks

The initramfs loads the encrypted root partition, and systemd creates mount units for each crypttab entry using a generator. See

man systemd-cryptsetup-generator

for more details about this. Unfortunately, all my raid disks are configured to be plain dm-crypt, and such a keyfile does not work with systemd. I did it this was because I was told that the luks header would cause alignment issues with zfs that would degrade filesystem performance. Well, plain dm-crypt is real pain in the ass, and I’d rather take the minor performance hit in the future. My solution is to use debian’s keyscript, decrypt_keyctl to not have to re-enter the password a billion times. However, this comes with issues too, since systemd does not support the keyscript option in crypttab! I have a simple workaround that others might find useful that I’ll detail below.

Initramfs setup

Copy the following files

cp -a /usr/share/initramfs-tools/scripts/local-top/cryptroot /etc/initramfs-tools/scripts/local-top/cryptroot
cp -a /usr/share/initramfs-tools/scripts/hooks/cryptroot /etc/initramfs-tools/scripts/hooks/cryptroot
cp -a /usr/share/initramfs-tools/scripts/local-block/cryptroot /etc/initramfs-tools/scripts/local-block/cryptroot
cp -a /usr/share/initramfs-tools/hooks/cryptkeyctl /etc/initramfs-tools/scripts/local-block/cryptroot

I don’t think all of these files are required, but copy them anyway. Then, edit the file /etc/initramfs-tools/conf.d/cryptroot and add a line

CRYPTOPTS=target=cryptroot,source=/dev/<device of root partition>,key=none,lvm=ssd1-debianroot

Do not run update-initramfs after you have all of this setup.

Crypttab

Next thing to do is to setup crypttab.

    # <target name>	<source device>		<key file>	<options>
    cryptroot UUID=<UUID>		        none            luks
    cryptzfs1  /dev/disk/by-id/<DEVICEID>   zfs_raidstore  plain,cipher=aes-xts-plain64,hash=sha512,offset=0,size=512,keyscript=decrypt_keyctl,initramfs
    cryptzfs2  /dev/disk/by-id/<DEVICEID>   zfs_raidstore  plain,cipher=aes-xts-plain64,hash=sha512,offset=0,size=512,keyscript=decrypt_keyctl,initramfs

The key is the keyscript=decrypt_keyctl line. This makes it store a key in the memory using the /bin/keyctl command. You need the keyutils package installed:

apt install keyutils

The zfs_raidstore identifies which of the crypttab entries have the same passphrase.

The third ingredient is the initramfs option, which tells the initramfs to load these crypttab entries. Usually the initramfs would only load the root partition. If you didn’t have this hook here, systemd would load it instead. And systemd does not currently have support for the keyscript line in crypttab, as mentioned earlier.

Therefore, systemd crypttab generators have to be disabled with the following line in /etc/default/grub.cfg

GRUB_CMDLINE_LINUX_DEFAULT="quiet luks.crypttab=no"

More options like luks.crypttab=no may be found in man systemd-cryptsetup-generator. You can test your crypttab setup with

cryptdisks_start <name in crypttab>

This is a required step before you run update-initramfs, since it appears to need the encrypted disks to be mounted. Backup your current initramfs (by adding a backup line in /etc/initramfs-tools/update-initramfs.conf for example) and then run

update-initramfs -k <version> -u -v

and you should look for lines like

copied /bin/keyctl
calling hook cryptkeyctl

and so on. Then run

update-grub

and reboot. Hopefully this works for you.

Problems

The problem is that the root encrypted partition is called by cryptroot, which doesn’t seem to listen to the keyscript option. Therefore, I have to enter the password twice, once for the raid array and once for the root partition.

The way to fix this is to have and have extra luks keys for the raid partitions as mentioned earlier. Unfortunately, my raid disks use plain dm-crypt. There are other solutions to this, of course: I could use a different keyscript that simply reads a file in a root partition and outputs the password. The complication with this is that you have to specify the device of the decrypted root device the keyscript lives on —I have not had much success with this— or store the key unecrypted in the initramfs.

Thus, this was my slightly suboptimal solution. However, if I end up having 10 disks in the future, this will be quite useful.

Standing Desks

17 Feb 2018

Summary

I’ve been getting a lot of back pain since I sit at my desk for extended periods without moving. I made a standing desk at my previous workplace in Utah using old textbooks. It worked great for a while, but it was hard to smoothly between standing and sitting.

So I did a bunch of research on standing desks before settling on the Jarvis L-shaped standing desk made by Ergodepot. Mine cost about 1400 including shipping. It’s a good standing desk that’s quite stable. The only problem with the whole experience was to get it shipped directly to my office, and they didn’t offer a service that would come and install it in your office. I could have just put it together myself, and it would have been a lot less complicated, I suppose. In the end, I had it shipped to a local office supply and furniture store that brought it into my office and installed it. They did not do a really good job, and this made the experience a little disappointing. However, Jarvis’ customer support was pretty good for the most part.

The Human Solution/UPLIFT also sells a similar L-shaped desk. It costs a little more, but they do have a service that will deliver to your office and not just to the University loading dock. I might go with their desk next time. I went with Jarvis because their longer desks have a lower crossbar that might help with stability.

One of the most important specifications for me was the desk range. The L-shaped Jarvis desk goes all the way down to 24” and goes all the way up to 48”. They call this the “extended range” version, and I highly recommend it.

My Research

I started with the wirecutter’s article on standing desks. They didn’t review too many desks, and so I was a little disappointed with their review. They did push the Jarvis hard though. They also reviewed UPLIFT’s competing desk.

So I did a bunch of digging around on the internet reading reviews. One thing to be wary about is that there are a lot of sites that look like legitimate review sites, but are actually manufacturers reviewing their own products. One of the worst offenders appears to be the iMovR desks.

The main things that seem to matter are the stability of the desk, and the reliability of the motors. The Jarvis and UPLIFT desks use the same base: it’s made by Jiecang, a Chinese manufacturer. They do seem to customize their desks a bit, though. For example, the Jarvis has a cross-bar in its L-shaped model that is supposed to help with stability a lot. Mine is pretty stable at all heights.

I’ve seen some manufacturer “reviewers” (like Vivo’s online reviews) that the “cheap” Chinese electronics aren’t as good as the European electronics manufacturers like Linak. I decided that these comments might be a little biased, and that I shouldn’t really take them too seriously.

I looked at a few reddit reviews of the Jarvis, and I found this one useful. He has a bunch of pictures on his post. The review sounded legit since he didn’t buy the wiretamers and other options from Ergodepot and instead bought them for cheap from Amazon.

I looked into these other desks as well:

Herman Miller. The University of Rochester has a furniture supplier that they recommend we all use, but the Herman Miller standing desks are pretty expensive, especially in the L-shaped configuration. The regular standing desks are very solid, good Herman Miller quality, and just a few 100 more than the Jarvis or the UPLIFT. It’s totally worth it, but the lead time was way too long. I would have to wait 3 weeks or more for the desk, whereas the online suppliers deliver in a few days.
NextDesk. These have a crossbar, and they’re supposed to be high quality. Some reviews claim that despite charging a lot of money, they aren’t really great.
UpDesk. They use Linak actuators and electronics, and many people say that this is better quality than the Jiecang actuators used by Jarvis. See this guy pushing it on reddit
iMovR. I didn’t like them. There aren’t too many real reviews online, and they do a lot of negative marketing in the form of fake reviews.
BOTD. another website that reviews their own tables well. They use Linak electronics as well.
Human Solution Uplift Desk. People like their desks and despite people worrying about the wobble, most people say that it’s hardly noticeable.
GeekDesk. Not so great reviews online..
Anthro Elevate II. Good reviews, but really expensive. It has a crossbar.
Vertdesk. I don’t have much to say about it, but I did see some reviews comparing it to the Jarvis.
Build your own. Get an electric stand from Alibaba/Aliexpress or Amazon (there are a bunch there), but its quite hard to source them without being a company. Then you can get a nice wooden tabletop from IKEA. This costs about the same as buying a readymade one.

Specifications

I compared two nearly equivalent desks from UPLIFT and Ergodepot to compare costs.

Fully Jarvis

Desk	$400
Contour	$80.00
Top Size 60 x 30”	$90.00
1 Powered Grommet	$39.00
Desk Heights from 27.25” to 46.5” - Save	-$25.00
Handset Programmable Memory	$35.00
Locking Casters Please Add	$29.00
Black WireTamer Cable Trays (2x)	$20.00
Mat Standard Size - Black	$79.00
5 year warranty
Shipping/installation (contractor)	$275
Total	$1022

Total 782. Plus shipping and installation is 275. 1057. Actually if I got everything, I get 850 + 275 assembly = 1125.

UPLIFT

Desk	$400
Desktop Size: 60x30: 60” x 30”	+$90.00
bambooEE: 1” thick Bamboo Curve	+$20.00
UPT019 Advanced Keypad	+$34.00
UPLIFT Accessory Kit (mat,usb hub, stand, tray)	+$0.00
One wire & one power grommet	+$39.00
Wire management tray only	+$19.00
Add Casters	+$29.00
Add Black Task Light	+$59.00
Add Half Circle Drawer	+$29.00
Warranty: 7-year, standard
Shipping/installation (internal)	+$300
Total	$1019

But this is missing the contoured topo mat, which people really like. Now the new cost is 1084. If I got the discounted option, then it would be even cheaper.

Options

Most of these desks come with a bunch of options. I went with

Programmable handset. This is invaluable.
A topo mat. I like mine and think its occasionally useful.
A little pen tray under the desk.
Wiretamer trays. These are great for wire management and to hold power strips.

People recommend that you don’t get the casters since it makes the desks quite wobbly.

A manual desk for home

I decided that I don’t really need a fancy desk for home, and went with a manual desk. I tried the ViVo manual desk from Amazon. It costs about $250, and I used an old IKEA tabletop. It was a wee bit wobbly and this bothered me a little bit. Also the range was a little lacking: it didn’t go low enough. I ended up exchanging it for the Titan desk from here. I’m quite happy with it.

Tracy-Widom universality for polymers in intermediate disorder

13 Feb 2018

Jeremy Quastel and I started working on this project when I visited the Fields Institute in June 2014. The most recent version of the paper fixes some serious errors, but it’s still under review and we have not posted the updated version on the arxiv.

When I was a graduate student, I had been reading about the Lindeberg style argument that Terry Tao and Van Vu used in the random matrix universality papers from 2010. It’s a simple argument originally introduced by Lindeberg in 1922 to prove the central limit theorem. Davar Khosnevisan believes that the method is due to Lyapunov; I struggled through parts of Lyapunov’s original paper in French to see if I could verify this, but I couldn’t quite get through it. The method was strengthened with a truncation argument and repopularized by Trotter in 1959. More recently, it has been applied quite successfully to more modern models by Chatterjee (2005).

I was trying to apply the method to first-passage percolation, and mentioned this to Jeremy when I started at the Fields institute. But it’s clear that a naive perturbation like this is not going to work: if one tries to use the method to prove that the fluctuations of the passage time are universal, one necessarily proves that the time-constant should be universal as well. This is not generally true in last-passage percolation at least (see Cox-Durrett 1981 for a first-passage example), so at the moment, this does seem like a serious obstruction. Luckily, Jeremy had co-discovered the intermediate disorder regime in directed polymers a few years earlier, and he suggested that we try to apply it here.

The analog of the time-constant in the $d+1$ dimensional polymer model is the free-energy. There are two parameters here: the length of the polymer and the inverse temperature $\beta$. The fluctuations of the free energy are thought to be given by the Tracy-Widom GUE (TW) distrbution even in the intermediate disorder regime $O(1) \gg \beta \geq O(N^{-1/4})$. A non-rigorous argument for this appears in the physics paper by Alberts, Khanin and Quastel (2014). In the special regime when $\beta = \tilde\beta N^{-1/4}$, Alberts-Khanin-Quastel (2015) proved that the partition function of the polymer scales to the solution of the stochastic heat equation. The log of solution to the stochastic heat equation has long-time $(\tilde\beta \to \infty)$ TW fluctuations (Amir-Corwin-Quastel 2011). So in the special intermediate-disorder double limit $N \to \infty$ and then $\tilde\beta \to \infty$, it’s known quite generally that the polymer has TW fluctuations. However, when $O(1) \gg \beta \gg O(N^{-1/4})$, the centered and rescaled log partition function of the polymer is converge directly to a TW random variable without having to take this double limit. But this had not been proved for any standard discrete polymer.

The free-energy does indeed become “universal” in the intermediate disorder regime. So there is some hope of using the Lindeberg replacement strategy. Moreover, since $\beta \to 0$ in this regime at a given rate, we can use this to control the error in a Taylor expansion of the log partition function. For example, this $\beta \to 0$ property is implicit in the Sherrington-Kirkpatrick spin glass, and hence the Lindeberg strategy has been quite successful there.

The situation is complicated by the fact that no standard polymer is known to be in the TW universality class, and so an “invariance theorem” is a little unsatisfactory. Seppalainen’s log-gamma polymer is not quite a standard polymer, but it’s solvable and can be shown to have TW fluctuations, at least for $\beta \geq \beta* > 0$ (O’Connell-Seppalainen-Zygouras 2014, Borodin-Corwin-Remenik 2013, Borodin-Corwin-Ferrari-Veto 2015). We first show that the log-gamma polymer has TW fluctuations in intermediate disorder; then we use the Lindeberg strategy to prove that polymers that are close to the log-gamma polymer also have TW fluctuations.

Weak and strong mixing

28 Jan 2018

I had a useful conversation with Jon Chaika today where he explained some elementary ergodic theory to me. I really ought to know these things better, so I decided to write it down and work some of the details out.

Given any dynamical system $(\OFP,T)$, we can define an $L^2$ unitary operator

\[\begin{align} Uf(x) = f(Tx). \end{align}\]

The unitarity follows from the fact that $T$ preserves measure. We express the ergodic average as

\[S_n(f) = \frac1{n} \sum_{i=1}^n U^i f\]

The ergodic theorem says that for any $L^1$ function, $S_nf \to \overline{f}$ almost surely and in $L^1$. Here, $\overline{f}$ is the conditional expectation with respect to the invariant $\sigma$-algebra of $T$. We will assume ergodicity; i.e., the invariant $\sigma$-algebra is trivial.

The usual notions of ergodicity, and mixing can be described in terms of the spectrum of the unitary operator $U$. For example, we know that $T$ is ergodic if and only if the eigenvalue $1$ is simple. In other words, the only eigenvectors corresponding to $1$ are constant functions.

A system is weak-mixing if it has no eigenvalues other than $1$. In terms of the operator $U$, a system is weak-mixing iff for each $f \in L^2$, and all $g \in L^2$,

\[\E[ g S_n(f)] \to \E[ g \overline{f}]\]

From this it is easy to argue as follows. Suppose there is a $\lambda \neq 1$ (but $\lvert\lambda\rvert = 1$) such that

\[Uf = \lambda f\]

Then from the characterization of weak-mixing above, we get

\[\E[ g S_n(f)] = \frac{1}n \frac{1 - \lambda^{n+1}}{1 - \lambda}\E[g f] \to 0.\]

Since $g$ can be any $L^2$ function, this completes the proof.

There is a spectral description of strong mixing as well, but it’s in terms of the spectral measure of $U$. Apparently, the requirement is that the spectral measure is Reichmann. I cannot reproduce the condition because I can’t quite remember it, but it seems like it’s saying something about the decay of the moments of the spectral measure.

The dyadic machine

One frequently hears that the obstructions to weak-mixing are compact group rotations. My impression was that one should think of systems that are not weak-mixing as containing a copy of the standard rotation on a unit-circle or a product of such rotations. This is not correct, and systems that are not weak-mixing can contain really complicated compact group rotations.

A canonical example of a compact group rotation is the so-called dyadic machine. Dyadic machines can have really complicated spectra.

The system is defined on $\W = \{0,1\}^{\Z}$. Suppose $x = (0,1,1,0,0,\ldots) \in \W$. Then, $Tx = (1,0,0,1,0,\ldots)$. That is, you add $1\mod2$ to the first coordinate and carry over the remainder to the next coordinate. Precisely,

\[\begin{align} (Tx)_0 & = (x_0 + 1) \mod 2 \quad \AND x'_1 := 1 - (Tx)_0\\ (Tx)_i & = (x_i + x'_i) \mod 2 \quad \AND x'_{i+1} := 1 - (Tx)_i \end{align}\]

where the $x’_i$ is simply used to track the “carrying over” operation. You can think of $\W$ as the unit interval by mapping each point to its binary expansion. In this case, the map acts quite weirdly on the unit interval.

The $\Bernoulli(1/2)$ measure is uniquely ergodic for this map. The unique ergodicity of this iid measure is quite suprising to me. It means that there are no bad sets to consider for the ergodic theorem; things converge “surely” and not almost surely.

Now, one notices that

\[T 1_{x_0 = 0} = 1_{x_0 = 1} \AND T 1_{x_0 = 1} = 1_{x_0 = 0}\]

and therefore $1_{x_0 = 0} - 1_{x_0 = 1}$ has eigenvalue $-1$. Similarly,

\[\begin{align*} T 1_{x_0 = 0,x_1 = 1} & = 1_{x_0 = 1,x_1 = 0}\\ T 1_{x_0 = 1,x_1 = 0} & = 1_{x_0 = 0,x_1 = 0}\\ T 1_{x_0 = 0,x_1 = 0} & = 1_{x_0 = 1,x_1 = 1}\\ T 1_{x_0 = 1,x_1 = 1} & = 1_{x_0 = 0,x_1 = 1} \end{align*}\]

This implies that, for example

\[T 1_{x_0 = 0,x_1 = 1} + s 1_{x_0 = 1,x_1 = 0} + s^2 1_{x_0 = 0,x_1 = 0} + s^3 1_{x_0 = 1,x_1 = 1} + s^4 1_{x_0 = 1,x_1 = 1}\]

is an eigenvector when $s^5 = 1$ is a root of unity! There are a whole host of such eigenvectors and corresponding eigenvalues. One can change the space to $\W = \{0,\ldots,k\}^{\Z}$ and change the operation to addition mod $k$ with carryover to get other spectra. Despite its complexity, the dyadic machine is not weak-mixing. Moreover, it’s a compact group rotation.

This note is clearly incomplete: I haven’t really specified what a compact group rotation is, and I haven’t proved that the dyadic machine is a compact group rotation. I intend to revisit this at some point in the future. Some of these concepts can be found in T. Austin’s notes. The notes also prove that the obstructions to weak mixing are such compact group rotations.

Older Newer

Arjun Krishnan

Tag Cloud