Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Identification, estimation and applications of a bivariate

∗†‡
long-range dependent time series model with general phase
Stefanos Kechagias Vladas Pipiras
SAS Institute University of North Carolina

March 27, 2018

Abstract
A new bivariate, two-sided, fractionally integrated time series model that allows for general
phase is proposed. In particular, the focus is on a suitable parametrization under which the
model is identifiable. A simulation study to test the performances of a conditional maximum
likelihood estimation method and of a forecasting approach is carried out, under the proposed
model. Finally, an application is presented to the U.S. inflation rates in goods and services
where models not allowing for general phase suffer from misspecification.

1 Introduction
In this work, we are interested in modeling bivariate (R2 –vector) time series exhibiting long-range
dependence (LRD, in short). In the univariate case, long-range dependent (LRD) time series
models are stationary with the autocovariance function decaying slowly like a power-law function
at large lags, or the spectral density diverging like a power-law function at the zero frequency. The
univariate LRD is understood well in theory and used widely in applications. See, for example,
Park and Willinger (2000), Robinson (2003), Doukhan et al. (2003), Palma (2007), Giraitis et al.
(2012), Beran et al. (2013), Pipiras and Taqqu (2017).
Bivariate and, more generally, multivariate (vector-valued) LRD time series models have also
been considered by a number of researchers. But theoretical foundations for a general class of
such models were laid only recently in Kechagias and Pipiras (2015). In particular, Kechagias and
Pipiras (2015) stressed the importance of the so-called phase parameters. Turning to the bivariate
case which is the focus of this work, the phase φ appears in the cross-spectrum of a bivariate LRD
series around the zero frequency and controls the (a)symmetry of the series at large time lags.
There are currently no parametric models of bivariate LRD with general phase that can be used in
estimation and applications. The goal of this work is to introduce such a class of models, and to
examine it through a simulation study and an application to real data. In the rest of this section,
we describe our contributions in greater detail.
A common parametric VARFIMA(0, D, 0) (Vector Autoregressive Fractionally Integrated Mov-
ing Average) model for a bivariate LRD time series {Xn }n∈Z ={(X1,n , X2,n )0 }n∈Z is obtained as a

AMS subject classification. Primary: 62M10, 62M15. Secondary: 60G22, 42A16.

Keywords and phrases: long-range dependence, bivariate time series, phase parameter, estimation, VARFIMA.

The second author was supported in part by NSA grant H98230-13-1-0220 and NSF grant DMS-1712966. The
authors would also like to thank Richard Davis (Columbia University) for his comments on an earlier version of this
paper.
natural extension of the univariate ARFIMA(0, d, 0) model by fractionally integrating the compo-
nent series of a bivariate white noise series, namely,

(I − B)−d1
    
X1,n 0 η1,n
Xn = = = (I − B)−D ηn , (1.1)
X2,n 0 (I − B)−d2 η2,n

where B is the backshift operator, I = B 0 is the identity operator, d1 , d2 ∈ (0, 1/2) are the LRD
parameters of the component series {X1,n }n∈Z and {X2,n }n∈Z , respectively, D = diag(d1 , d2 ) and
{ηn }n∈Z = {(η1,n , η2,n )0 }n∈Z is a bivariate white noise series with zero mean Eηn = 0 and covariance
Eηn ηn0 = Σ. If Σ = QQ0 , note that the model (1.1) can also be written as

Xn = (I − B)−D Qn or (I − B)D Xn = ηn = Qn , (1.2)

where {n }n∈Z is a bivariate white noise with the identity covariance matrix En 0n = I2 . Through-
out the paper, the prime indicates the transpose.
The model (1.1) admits a one-sided linear representation of the form
X X
Xn = Ψ
e k ηn−k = Ψk n−k , (1.3)
k∈I k∈I

where I = {k ∈ Z : k ≥ 0} and Ψ e k , Ψk are real-valued 2 × 2 matrices whose entries decay as a


power law as k → ∞. In the frequency domain, the matrix-valued spectral density function1 f (λ)
of the series Xn defined in (1.1) satisfies

ω11 |λ|−2d1 ω12 |λ|−(d1 +d2 ) e−isign(λ)φ


 
f (λ) ∼ , as λ → 0, (1.4)
ω21 |λ|−(d1 +d2 ) eisign(λ)φ ω22 |λ|−2d2

where ∼ indicates the asymptotic equivalence, ω11 , ω12 , ω21 , ω22 ∈ R and

φ = (d1 − d2 )π/2. (1.5)

The asymptotic behavior (1.4) of the spectral density f with general φ ∈ (−π/2, π/2) is taken
for the definition of bivariate LRD in Kechagias and Pipiras (2015). Note that φ ∈ (−π/2, π/2) is
taken and the following polar coordinate representation
z1 z 
2
z= eiφ with φ = arctan (1.6)
cos(φ) z1

of z = z1 + iz2 ∈ C is used throughout. The special form of the phase parameter φ in (1.5) limits
the type of bivariate LRD behavior that can be captured by the model (1.1). For example, in the
case of time-reversible models satisfying γ(n) = γ(−n), n ∈ Z, the spectral density matrix f has
real-valued entries and hence φ = 0. Under the model (1.1) and (1.4), however, φ = 0 holds only
when d1 = d2 . These observations naturally raise the following question:

Can one define a bivariate parametric LRD model with general phase?
1
The following convention is usedR here. The autocovariance function γ is defined as γ(n) = EXn X00 and the
π
spectral density f (λ) satisfies γ(n) = −π einλ f (λ)dλ. The convention is different from Kechagias and Pipiras (2015),
0
where EX0 Xn is used as the autocovariance function, but is the same as in Brockwell and Davis (2009), Pipiras and
Taqqu (2017). See also Remark 2.2 below.

2
One solution to the question above is to consider two-sided linear representations with power-
law decaying coefficients, that is, representations of the form (1.3) with the index set I now be-
ing the set of all integers Z. Specifically, Kechagias and Pipiras (2015) constructed a two-sided
VARFIMA(0, D, 0) model with general phase by taking

Xn = (I − B)−D Q+ n + (I − B −1 )−D Q− n , (1.7)

where Q+ , Q− are two real-valued 2 × 2 matrices. The reason we refer to (1.7) as two-sided is the
presence of B −1 in the second term of the right-hand side of (1.7), which translates into having the
leads of the innovation process n . Also, the positive and negative powers of the backshift operator
motivate our notation for the subscripts of the matrices Q+ and Q− .
We shall use (1.7) in developing our parametric bivariate LRD model with general phase. A
first issue that needs to be addressed is finding a suitable parametrization under which this model is
identifiable, while still yielding a general phase parameter. We show in Section 2 that this two-fold
goal can be achieved by taking Q− as
 
c 0
Q− = Q+ =: CQ+ , (1.8)
0 −c

for some real constant c. Under the relation (1.8) and letting {Zn }n∈Z be a zero mean bivariate
0 0
white noise series with covariance matrix EZn Zn = Q+ Q+ =: Σ, we can rewrite (1.7) in the more
succinct form
Xn = ∆c (B)−1 Zn , (1.9)
where the operator ∆c (B)−1 is defined as

∆c (B)−1 = (I − B)−D + (I − B −1 )−D C. (1.10)

Note that when c = 0 (and C = 0), the filter ∆c (B)−1 becomes the one-sided fractional integration
filter ∆0 (B)−1 = (I − B)−D .
The focus of Section 3 will be on extensions of the model (1.9)–(1.10) involving autoregressive
and moving average parts, namely, a general phase VARFIMA(p, D, q) model

Φ(B)Xn = ∆c (B)−1 Θ(B)Zn , (1.11)

where Φ(B), Θ(B) are matrix polynomials of finite orders p and q satisfying the usual stationarity
and invertibility conditions. In fact, for identifiability and estimation purposes, we shall work with
diagonal AR filters Φ(B) in which case the general phase VARFIMA(p, D, q) model (1.11) can also
be expressed as
Φ(B)∆c (B)Xn = Θ(B)Zn , (1.12)
since the diagonal filters Φ(B), ∆c (B) commute. (In fact, the model (1.12) is usually referred to as
FIVARMA – see Section 3.2 below.) The advantage of the model (1.11), as we show, is that the
autocovariance function of the right-hand side of (1.11) can be computed explicitly. In estimation,
we can then employ a conditional likelihood approach where the Gaussian likelihood is written for
Φ(B)Xn (though the maximum is still sought over all unknown parameters).
We should also emphasize that the analysis of this work is limited to the second-order properties
of time series (that is, autocovariance, cross-correlation, cross-spectrum, and so on). Thus, although
the models (1.7) and (1.11)–(1.12) are expressed through two-sided and hence non-causal linear
representations, their noncausal nature is irrelevant to the extent that these models are used only

3
as suitable parameterizations of bivariate long-range dependent models allowing for general phase
through their second-order properties.
Our estimation procedure follows the approach of Tsay (2010) who considered one-sided models
(1.12) with c = 0. Still in the case c = 0, Sowell (1986) calculated (numerically) the autocovariance
function of the model (1.11) and performed exact likelihood estimation. For other approaches (all
in the case c = 0), see also Ravishanker and Ray (1997) who considered the Bayesian analysis and
Pai and Ravishanker (2009a, 2009b) who employed the EM and PCG algorithms, as well as Dueker
and Starz (1998), Martin and Wilkins (1999), Sela and Hurvich (2009) and Diongue (2010).
The rest of the paper is structured as follows. General phase VARFIMA(0, D, 0) and
VARFIMA(p, D, q) series are presented in Sections 2 and 3. Estimation and other tasks are con-
sidered in Section 4. Section 5 contains a simulation study, and Section 6 contains an application
to the U.S. inflation rates.

2 General phase VARFIMA(0, D, 0) series


In this section, we consider the two-sided bivariate VARFIMA(0, D, 0) model (1.9)–(1.10). Kecha-
gias and Pipiras (2015) showed that any phase parameter φ in (1.4) can be obtained with the model
(1.7) for an appropriate choice of Q+ and Q− . However, letting the entries of these matrices take
any real value, causes identifiability issues around the zero frequency, as the same phase param-
eter can be obtained by more than one choice of Q+ and Q− . Indeed, from the following simple
counting perspective, note that the specification (1.4) has 6 parameters (d1 , d2 , ω11 , ω12 , ω22 and
φ) whereas the model (1.7) has 10 (d1 , d2 and the entries of Q+ and Q− ). One might naturally
expect identifiability up to Q+ Q0+ and Q− Q0− but this still leaves the number of parameters at 8
(d1 , d2 and the 6 different entries of Q+ Q0+ and Q− Q0− ). In Proposition 2.1 and Corollary 2.1 below
(see also the discussion following the latter), we show that the parameterization (1.8) addresses the
identifiability and general phase issues. For one, note that the model (1.9)–(1.10) has the required
number of 6 parameters (d1 , d2 , c and the three different entries of Σ = Q+ Q0+ ).

Proposition 2.1 Let d1 , d2 ∈ (0, 1/2) and Q+ be a 2 × 2 matrix with real-valued entries. Let
(c)
also {Xn }n∈Z be a time series defined by (1.9)–(1.10) where D = diag(d1 , d2 ). For any φc ∈
(c)
(−π/2, π/2), there exists a unique constant c ∈ (−1, 1) such that the series {Xn }n∈Z has the
phase parameter φ = φc in (1.4). Moreover, the constant c has a closed form given by
a1 −a2
if φc = arctan aa11 −a
(
a1 +a2 ,√ +a2 ,
2

c = c(φc ) = 2(a1 +a2 )− ∆ (2.1)


2(a1 −a2 +tan(φc )(1+a1 a2 )) , otherwise,

where
   
πd1 πd2
a1 = tan , a2 = tan and ∆ = 16a1 a2 + 4(1 + a1 a2 )2 tan2 (φc ). (2.2)
2 2

The function c = c(φc ) in (2.1) is continuous at φc = arctan((a1 − a2 )/(a1 + a2 )).

Proof: By using Theorem 11.8.3 in Brockwell and Davis (2009), the VARFIMA(0, D, 0) series
in (1.9)–(1.10) has a spectral density matrix
1 ∗
f (λ) = ∆c (e−iλ )−1 Σ∆c (e−iλ )−1 , (2.3)

4
where the superscript ∗ denotes the complex conjugate operation. From (1.8) and by using the fact
that 1 − e±iλ ∼ ∓iλ, as λ → 0, we have
1
(iλ)−D + (−iλ)−D C Σ (−iλ)−D + C(iλ)−D ,
 
f (λ) ∼ as λ → 0. (2.4)

Next, by denoting Σ = (σjk )j,k=1,2 and using the relation ±i = e±iπ/2 , we get that the (j, k) element
of the spectral density f (λ) satisfies

fjk (λ) ∼ gjk λ−(dj +dk ) , as λ → 0+ , (2.5)

where the complex constant gjk is given by


σjk −iπdj /2
gjk = (e + (−1)j+1 ceiπdj /2 ) · (eiπdk /2 + (−1)k+1 ce−iπdk /2 ) (2.6)

and (−1)j+1 , (−1)k+1 in (2.6) account for the different signs next to c’s in the diagonal matrix
C in (1.8). Focusing on the (1, 2) element, and by applying the polar-coordinate representation
z1
z = cos(φ) eiφ of z = z1 + iz2 ∈ C with φ = arctan(z2 /z1 ) (see (1.6) above) to the two multiplication
terms below separately, we have
   
σ12 πd1 πd1 πd2 πd2
g12 = cos( )(1 + c) + i sin( )(c − 1) · cos( )(1 − c) + i sin( )(1 + c)
2π 2 2 2 2
σ12 cos( πd2 1 ) cos( πd2 2 )
= (1 − c2 )e−iφc , (2.7)
2π cos(φc,1 ) cos(φc,2 )

where
   
c−1 1+c
φc = −(φc,1 + φc,2 ), φc,1 = arctan a1 and φc,2 = arctan a2 (2.8)
1+c 1−c

with a1 and a2 given in (2.2).


u+v
By using the arctangent addition formula arctan(u) + arctan(v) = arctan( 1−uv ) for uv < 1 (in
our case uv = −a1 a2 < 0), we can rewrite φc as
!
a1 c−1
1+c + a 2
1+c
1−c
φc = − arctan =: h(c). (2.9)
1 + a1 a2

For all d1 , d2 ∈ (0, 1/2), the function h : (−1, 1) → (−π/2, π/2) is strictly decreasing (and therefore
1-1) and also satisfies
π π
lim h(c) = , lim h(c) = − .
c↓−1 2 c↑1 2
Since h is continuous, it is also onto its range which completes the existence and uniqueness part
of the proof.
To obtain the formula (2.1), we invert the relation (2.9) to get the quadratic equation

(a1 − a2 + tan(φc )(1 + a1 a2 ))c2 − 2(a1 + a2 )c + a1 − a2 − tan(φc )(1 + a1 a2 ) = 0, (2.10)

whose discriminant ∆ is given by

∆ = 16a1 a2 + 4(1 + a1 a2 )2 tan2 (φc )

5
and is always positive. The solutions of (2.10) are then given by
√ √
2(a1 + a2 ) + ∆ 2(a1 + a2 ) − ∆
c1 = , c2 = .
2(a1 − a2 + tan(φc )(1 + a1 a2 )) 2(a1 − a2 + tan(φc )(1 + a1 a2 ))

It can be checked that c1 ∈


/ (−1, 1) and c2 ∈ (−1, 1). Note that, when a1 −a2 +tan(φc )(1+a1 a2 ) = 0
or φc = arctan((a1 − a2 )/(a1 + a2 )), the quadratic equation (2.10) becomes a linear equation with
the solution
a1 − a2 − tan(φc )(1 + a1 a2 ) a1 − a2
c= = ,
2(a1 + a2 ) a1 + a2
which always satisfies c ∈ (−1, 1). Finally, the fact that the function c = c(φc ) in (2.1) is continuous
at φc = arctan((a1 − a2 )/(a1 + a2 )) can be checked easily. 
The following result is a direct consequence of the proof of Proposition 2.1.
(c)
Corollary 2.1 The spectral density of the time series {Xn }n∈Z in Proposition 2.1 satisfies the
asymptotic relation (1.4) with φ = φc and
σjj
ωjj = (1 + c2 + (−1)j+1 2c cos(πdj )), j = 1, 2, (2.11)

σ12 cos( πd2 1 ) cos( πd2 2 )


ω12 = (1 − c2 ), (2.12)
2π cos(φc,1 ) cos(φc,2 )
where Σ = Q+ Q0+ = (σjk )j,k=1,2 and φc,1 , φc,2 are given in (2.8).

Proof: The relations (2.11)–(2.12) follow from (2.5)–(2.6) and (2.7)–(2.8). 


Corollary 2.1 shows that the bivariate LRD model (1.9)–(1.10) is identifiable around the zero
frequency when parametrized by d1 , d2 , Σ = Q+ Q0+ and c. It will be referred to as the general phase
VARFIMA(0, D, 0) series (two-sided VARFIMA(0, D, 0) series).

Remark 2.1 Proposition 2.1 relates the phase φ at the zero frequency and the constant c which
appears in the full model (1.9)–(1.10). For this model, however, the phase function φ(λ) of the full
cross spectral density f12 (λ) = g12 (λ)e−iφ(λ) or f21 (λ) = g12 (λ)eiφ(λ) , λ ∈ (0, π), is not a constant
function of the frequency λ. Instead, by using the identity 1 − e∓iλ = 2 sin( λ2 )e∓i(λ−π) and arguing
as for (2.7)–(2.9) above, it can be shown that
!
x1 (λ) c−1 1+c
1+c + x2 (λ) 1−c
φ(λ) = − arctan , (2.13)
1 + x1 (λ)x2 (λ)

where    
d1 (π − λ) d2 (π − λ)
x1 (λ) = tan , x2 (λ) = tan . (2.14)
2 2
Several plots of the phase function (2.13) are given in Figure 1. We also note that Sela (2010)
considers LRD models with phase functions φ(λ) following special power laws, but we will not
expand in this direction.

6
1.5 1.5

1 1

0.5 0.5
φ(λ)

φ(λ)
0 0

-0.5 -0.5

c1 =0 c1 =0
-1 c2 =0.5 -1 c2 =-0.7
c3 =-0.8 c3 =0.7
c4 =-0.4 c4 =-0.3
-1.5 -1.5
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
λ λ

Figure 1: Phase functions φ(λ) for the model (1.9)–(1.10) for different parameter values. Left:
c = 0, 0.5, −0.8, −0.4 and d1 = 0.2, d2 = 0.4. Right: c = 0, −0.7, 0.7, −0.3, d1 = d2 = 0.3.

1.6 0

1.4 -0.2

1.2 -0.4

1 -0.6

0.8 -0.8

0.6 -1

0.4 -1.2

0.2 -1.4

0 -1.6
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Figure 2: The function bd (c), c ∈ (−1, 1), in (2.15). Left: d > 0. Right: d < 0.

Remark 2.2 The autocovariance function of the model (1.7) has an explicit form given in Propo-
sition 5.1 of Kechagias and Pipiras (2015). The same form can obviously be used for the model
(1.9)–(1.10). We also note here that, strictly speaking, Proposition 5.1 is incorrect as stated in the
context of Kechagias and Pipiras (2015). As indicated in a footnote in Section 1 above, Kechagias
and Pipiras (2015) use the convention EX0 Xn0 for autocovariances. But the proof of Proposition
5.1 is based on Brockwell and Davis (2009), who use the different convention EXn X00 and which
is also adopted here. The result of Proposition 5.1 is thus correct but using the convention of this
paper, and correct only up to matrix transposition in the context of Kechagias and Pipiras (2015).

Remark 2.3 Technical issues of Proposition 2.1 aside, there is a simple way to see why the pro-
posed model will yield a general phase. Note that a generic term e−iπd/2 + ceiπd/2 entering (2.6)
can be expressed in polar coordinates as

e−iπd/2 + ceiπd/2 = ad (c)e−ibd (c) . (2.15)

The generic shape of the function bd (c), c ∈ (−1, 1), is given in Figure 2, left plot, for d ∈ (0, 1/2),
and right plot, for d ∈ (−1/2, 0). When d ∈ (0, 1/2), the range of bd (c), c ∈ (−1, 1), is (0, π/2)
and when d ∈ (−1/2, 0), it is (−π/2, 0). When combined into the phase φc of (2.7), this obviously
leads to the phase φc that covers the whole range (−π/2, π/2). This discussion also shows that, for
example, the choice Q− = cQ+ , c ∈ (−1, 1), would not lead to a general phase parameter for the

7
resulting bivariate LRD models. A related note is that we presently do not have an explicit form
for the inverse filter ∆c (B). But this observation also suggests that one could possibly work with
the filter
∆e c (B)−1 = ((I − B)D + C(I − B −1 )D )−1 , (2.16)
e c (B) applied to the series {Xn }n∈Z . Other filters
if the goal is to have an explicit form of the filter ∆
than (1.10) and (2.16) with interesting properties might also exist and could also be considered. In
using (1.10), we aimed to have a general phase and an explicit form of the autocovariance function.

Remark 2.4 We have assumed in Proposition 2.1 that both component series are LRD, that is,
d1 , d2 ∈ (0, 1/2). In fact, the proposed model is not fully suited to accommodate the case when d’s
are allowed to belong to the so-called “principal” range d ∈ (−1/2, 1/2), including the case d = 0
associated with short-range dependence (SRD, for short). For example, if d2 = 0, the discussion in
Remark 2.3 shows that the phase φc covers the range (0, π/2), and thus excludes φc ∈ (−π/2, 0).
Similarly, when d2 < 0, only part of the range (0, π/2) is covered. When d1 , d2 ∈ (−1/2, 1/2),
a general phase could in fact be obtained by one of the following models (the introduced model
given first): with c ∈ (−1, 1),

Case of d1 , d2 Same sign  Opposite


 signs
c 0
Q+ Q Q =: C+ Q
   0 1 
c 0 1 0
Q− Q =: CQ Q =: C− Q
0 −c 0 −c
Indeed, this could be seen by using the expression (2.6) (modified accordingly for the model with
Q+ = C+ Q and Q− = C− Q) and the discussion found in Remark 2.3. When one of the d’s is zero,
say d2 = 0, the model with Q+ = Q, Q− = CQ gives a positive phase φc only as discussed in
Remark 2.3 but the model with Q+ = C+ Q, Q− = C− Q gives a negative phase φc . In practice,
the two models could be fitted for the range d1 , d2 ∈ (−1/2, 1/2) and the model with the larger
likelihood could be selected (a BIC or other model selection criterion could also be used if the
numbers of parameters in larger models, as those considered below, are different). Whether the
two models could be combined into a single model remains an open question.

Remark 2.5 We stress again that the case c = 0 corresponds to the phase φ = (d1 − d2 )π/2 (and
in particular not necessarily φ = 0.) Note also that if the two component series are interchanged (so
that d1 and d2 are interchanged, and φ becomes −φ), then the constant c in (2.1) changes to −c.
Finally, we note that the relation (2.1) does not involve the covariance matrix Σ of the innovation
terms, but that (2.11) and (2.12) obviously do.

Remark 2.6 Finally, we also note the following important point regarding the boundaries c = ±1
of the range c ∈ (−1, 1). As c → ±1, the phase parameter φc → ∓π/2. In the specification ω12 e−iφc
of the cross-spectrum constant, note that the cases φc = π/2 and φc = −π/2 are equivalent by
changing the sign of ω12 , since ω12 e−iπ/2 = (−ω12 )e−i(−π/2) . In the model (1.9)–(1.10), the sign of
ω12 is the same as the sign of σ12 . From a practical perpective, this observation means that for the
model (1.9)–(1.10) with c close to 1 (−1, resp.), it would be common to estimate c close to −1 (1,
resp.) and σ12 with the opposite sign, since the respective models are not that different. This is
also certainly what we observed in our simulations.

8
3 General phase VARFIMA(p, D, q) series
In this section, we generalize the model (1.9)–(1.10) by introducing autoregressive (AR, for short)
and moving average (MA, for short) components to capture potential short-range dependence
effects. For the one-sided model (1.1), this extension has been achieved in a number of ways.
Naturally, we focus on extensions that preserve the general phase and identifiability properties. We
also consider the problem of computing (theoretically or numerically) the autocovariance functions
of the introduced models, since these functions are used in estimation (see Section 4 below).

3.1 VARFIMA(0, D, q) series


We begin with the case p = 0 (where there is no AR part). Define the general phase
VARFIMA(0, D, q) series (two-sided VARFIMA(0, D, q) series) as

Yn = ∆c (B)−1 Θ(B)Zn , (3.1)

where ∆c (B)−1 is the operator given by (1.10) and

Θ(B) = I2 + Θ1 B + . . . + Θq B q (3.2)

is a matrix polynomial with 2 × 2 real-valued matrices Θs = (θjk,s )j,k=1,2 , s = 1, . . . , q. As through-


out this paper, {Zn }n∈Z is a white noise series with EZn Zn0 = Σ = (σjk )j,k=1,2 . In the special case
where Θ(B) is diagonal or when d1 = d2 , the model (3.1) is equivalent to

Yn = Θ(B)∆c (B)−1 Zn . (3.3)

The two operators ∆c (B)−1 and Θ(B), however, do not commute in general. In fact, the two models
in (3.1) and (3.3) are quite different. More specifically, if Θ(B) has at least one nonzero element
on the off diagonal and if d1 6= d2 , the series {Yn }n∈Z in (3.3) can be thought to exhibit a form
of fractional cointegration, by writing Θ(B)−1 Yn = ∆c (B)−1 Zn where the reduction of memory in
one of the component series of {Yn }n∈Z could occur from linear combination of present and past
variables of the two component series. On the other hand, fractional cointegration cannot occur
under the model (3.1). In the rest of this work, we will restrict our attention to this simpler case,
leaving the investigation of fractional cointegration for future work.
In the next proposition, we compute the autocovariance function of the series in (3.1). Tsay
(2010) calculated the autocovariance function of the one-sided analogue of (3.1) using the properties
of the hypergeometric function. Our approach, which we find less cumbersome for the multivariate
case, is similar to the one used for the two-sided VARFIMA(0, D, 0) series in Proposition 5.1 of
Kechagias and Pipiras (2015) (see also Remark 2.2 above).

Proposition 3.1 The (j, k) component γjk (n) of the autocovariance matrix function γ(n) of the
bivariate two-sided VARFIMA(0, D, q) series in (3.1) is given by
2 q
1 X X 
(1) (2) (3) (4)

γjk (n) = θju,s θkv,t σuv a1,jk γst,jk (n) + a2,j γst,jk (n) + γst,jk (n) + a4,k γst,jk (n) , (3.4)

u,v=1 s,t=0

where Θs = (θjk,s )j,k=1,2,s=1,...,q , Σ = (σjk )j,k=1,2 ,

a1,jk = c2 (−1)j+k , a2,j = c(−1)j+1 , a4,k = c(−1)k+1 , (3.5)

9
and
(1) (3) Γ(n+t−s+dk )
γst,jk (n) = γst,kj (n) = 2Γ(1 − dj − dk ) sin(πdk ) Γ(n+t−s+1−d j)
,
(
Γ(dj +dk +n+t−s)
(4) (2) 2π Γ(dj1+dk ) Γ(1+n+t−s) , n ≥ s − t, (3.6)
γst,jk (n) = γts,jk (−n) =
0 , n < s − t.

Proof: By using Theorem 11.8.3 in Brockwell and Davis (2009), the VARFIMA(0, D, q) series
in (3.1) has a spectral density matrix
1
f (λ) = G(λ)ΣG(λ)∗ , (3.7)

where G(λ) = ∆c (e−iλ )−1 Θ(e−iλ ). The (j, k) component of the spectral density is given by
2 q
1 X X
fjk (λ) = θju,s θkv,t σuv e−i(s−t)λ (f1,jk (λ) + f2,jk (λ) + f3,jk (λ) + f4,jk (λ)), (3.8)

u,v=1 s,t=0

where
f1,jk (λ) = a1,jk (1 − eiλ )−dj (1 − e−iλ )−dk , f2,jk (λ) = a2,j (1 − eiλ )−(dj +dk ) ,
(3.9)
f3,jk (λ) = (1 − e−iλ )−dj (1 − eiλ )−dk , f4,jk (λ) = a4,k (1 − e−iλ )−(dj +dk ) .
Consequently, the (j, k) component of the autocovariance matrix satisfies γjk (n) =
R 2π inλ
0 e fjk (λ)dλ, which in view of the relations (3.8)–(3.9) implies (3.4)–(3.5) with
Z 2π
(1) (3)
γst,jk (n) = γst,kj (n) = ei(n−s+t)λ (1 − eiλ )−dj (1 − e−iλ )−dk dλ,
0
Z 2π Z 2π
(2) iλ −xjk (4)
γst,jk (n) = e i(n−s+t)λ
(1 − e ) dλ, γst,jk (n) = ei(n−s+t)λ (1 − e−iλ )−xjk dλ,
0 0
where xjk = dj + dk . The relations (3.6) follow from the evaluation of the integrals above as in the
proof of Proposition 5.1 of Kechagias and Pipiras (2015). 
Remark 3.1 Since Θ(e−iλ ) ∼ I2 +Θ1 +. . .+Θq as λ → 0, and since the relation (2.1) in Proposition
2.1 does not involve Σ, the two-sided VARFIMA(0, D, q) model has a general phase at the zero
frequency (with the same relation (2.1) between the phase φc and the parameter c). The parameters
of Θ’s are identifiable if and only if they are identifiable for the same VARMA(0, q) model.

3.2 VARFIMA(p, D, q) series


We extend here the model (3.1) to a general phase fractionally integrated model containing both
autoregressive and moving average components. As for the one-sided model (1.1), two possibilities
can be considered for this extension. Let Φ(B) = I2 −Φ1 B−. . .−Φp B p be the AR polynomial, where
Φr = (φjk,r )j,k=1,2 , r = 1, . . . , p, are 2×2 real-valued matrices. Following the terminology of Sela and
Hurvich (2009), define the general phase VARFIMA(p, D, q) series (two-sided VARFIMA(p, D, q)
series) {Xn }n∈Z as
Φ(B)Xn = ∆c (B)−1 Θ(B)Zn , (3.10)
and the general-phase FIVARMA(p, D, q) series2 (two-sided FIVARMA(p, D, q) series) as
Φ(B)∆c (B)Xn = Θ(B)Zn . (3.11)
2
The names VARFIMA and FIVARMA refer to the facts that the fractional integration (FI) operator ∆c (B)−1
is applied to the MA part in (3.10), and after writing Xn = ∆c (B)−1 Φ(B)−1 Θ(B)Zn , it is applied to the VARMA
series in (3.11).

10
(A priori, it is not clear whether (3.10) and (3.11) have a general phase but we use the terminology
in line with Sections 2 and 3.1.) The one-sided FIVARMA(p, D, q) series (with c = 0 in (3.11))
have been more popular in the literature, with Lobato (1997), Sela and Hurvich (2009) and Tsay
(2010) being notable exceptions. In particular, Sela and Hurvich (2009) investigated thoroughly
the differences between the one-sided analogues of the models (3.10) and (3.11), focusing on models
with no MA part.
As expected, the two-sided VARFIMA and FIVARMA series differ if Φ(B) is nondiagonal and
if d1 6= d2 . Similarly to the discussion around the models (3.1) and (3.3), the VARFIMA model
with nondiagonal Φ(B) allows for fractional cointegration in the sense discussed following the
relation (3.3), which however cannot be produced by the FIVARMA model (3.11) (see Sela and
Hurvich (2009) for more details in the one-sided case). As indicated earlier, the case of fractional
cointegration will be pursued elsewhere (though we shall also briefly mention some numerical results
in Section 5).
We will focus on the VARFIMA(p, D, q) series (3.10) with a diagonal AR part, in which case
the two models (3.10) and (3.11) are equivalent. Besides the obvious computational and simpli-
fication advantages of this assumption, our consideration is also justified by similar assumptions
recently used in Dufour and Pelletier (2014) for the construction of identifiable multivariate short-
range dependent time series models. More specifically, Dufour and Pelletier (2014) show that any
VARMA(p, q) series can be transformed to have a diagonal AR (or MA) part with the cost of
increasing the order of the MA (or AR) component. As a consequence, they construct identifiable
representations of VARMA(p, q) series where either AR or MA part is diagonal. As in Remark
3.1, such two-sided VARFIMA(p, D, q) model has general phases at the zero frequency, and its
parameters are identifiable if and only if they are identifiable for the same VARMA(p, q) model.
The presence of the AR filter on the left-hand side of (3.10) makes it difficult to compute the au-
tocovariance function of the series explicitly. Closed form formulas for the autocovariance function
of the one-sided model (3.11) with c = 0 were provided by Sowell (1986), albeit his implementa-
tion is computationally inefficient as it requires multiple expensive evaluations of hypergeometric
functions. The slow performance of Sowell’s approach was also noted by Sela (2010), who proposed
fast approximate algorithms for calculating the autocovariance functions of the one-sided models
(3.10) and (3.11) with c = 0 when p = 1 and q = 0. Although not exact, Sela’s algorithms are
fast with negligible approximation errors. In fact, it is straightforward to extend these algorithms
to calculate the autocovariance function of a two-sided VARFIMA(1, D, q) series. For models with
AR components of higher orders, however, this extension seems to require restrictive assumptions
on the AR coefficients and therefore we do not pursue this approach.

Remark 3.2 There is yet another reason for making our assumption of a diagonal AR part Φ(B).
By using the reparametrizations of Dufour and Pelletier (2014), the FIVARMA model (3.11) can
take the form (3.10) with diagonal Φ(B). Indeed, write first the model (3.11) as

∆c (B)Xn = Φ(B)−1 Θ(B)Zn .

Next, by using the relation Φ(B)−1 = |Φ(B)|−1 adj(Φ(B)), where | · | and adj(·) denote the deter-
minant and adjoint of a matrix respectively, we can write

∆c (B)|Φ(B)|Xn = adj(Φ(B))Θ(B)Zn , (3.12)

where the commutation of ∆c (B) and |Φ(B)| is possible since |Φ(B)| is scalar-valued. Letting
Φ(B)
e = diag(|Φ(B)|) and Θ(B)
e = adj(Φ(B))Θ(B), the relation (3.12) yields

∆c (B)Φ(B)X
e n = Θ(B)Zn .
e (3.13)

11
Thus, a FIVARMA model with AR component of order p can indeed be written as a VARFIMA
model with a diagonal AR part whose order will not exceed 2p (the maximum possible order of
|Φ(B)|).

4 Estimation and other tasks


In this section, we discuss estimation of the general phase VARFIMA(p, D, q) model (3.10) intro-
duced in Section 3.2. Estimation of the parameters of this model can be carried out by adapting
the CLDL (Conditional Likelihood Durbin Levinson) estimation of Tsay (2010). Tsay’s method is
appealing in our case for a number of reasons. First, as discussed in Section 4.1 below, the method
requires only the knowledge of the autocovariance function of the general phase VARFIMA(0, D, q)
series (3.1) for which we have an explicit form. Second, Tsay’s algorithm can be modified easily to
yield multiple steps-ahead (finite sample) forecasts of the series. Finally, Tsay’s method has a mild
computational cost, compared to most alternative estimation methods.

4.1 Estimation
The basic idea of Tsay’s CLDL algorithm is to transform a VARFIMA(p, D, q) series to a
VARFIMA(0, D, q) series whose autocovariance function has a closed form. Then, a straightforward
implementation of the well-known Durbin-Levinson (DL, for short) algorithm allows one to replace
the computationally expensive likelihood calculations of the determinant and the quadratic part
with less time consuming operations. We give next a brief description of the algorithm, starting
with some notation.
Let {Yn }n=1,...,N be the two-sided VARFIMA(0, D, q) series (3.1) and let Γ(k) = EYk Y00 denote
its autocovariance function. Let also Θ = (vec(Θ1 )0 , . . . , vec(Θq )0 )0 be the vector containing the
entries of the coefficient matrices of the MA polynomial Θ(B). Assuming that the bivariate white
noise series {Zn } is Gaussian, we can express the likelihood function of {Yn }n=1,...,N with the aid of
the multivariate DL algorithm (see Brockwell and Davis (2009), p. 422). More specifically, letting
η = (d1 , d2 , c, σ11 , σ12 , σ22 , Θ0 )0 be the (6 + 4q)–dimensional vector containing all the parameters of
the model (3.1), we can write the likelihood function as
−1
 NY −1/2 N −1
n 1X o
L(η; Y ) = (2π)−N Vj exp − (Yj+1 − Ybj+1 )0 Vj−1 (Yj+1 − Ybj+1 ) , (4.1)
2
j=0 j=0

where Ybj+1 = E(Yj+1 |Y1 , . . . , Yj ) and Vj , j = 0, . . . , N − 1, are the one-step-ahead finite sam-
ple predictors and their corresponding error covariance matrices obtained by the multivariate DL
algorithm. Using the fact that the series {Yn }n=1,...,N satisfies the relation

Φ(B)Xn = Yn , (4.2)

where {Xn }n=1,...,N is the two-sided VARFIMA(p, D, q) series (3.10), we can view
{Φ(B)Xn }n=p+1,...,N as a two-sided VARFIMA(0, D, q) series, whose likelihood function conditional
on X1 , . . . , Xp and Φ = (vec(Φ1 )0 , . . . , vec(Φp )0 )0 is given by

L(Φ, η; Xn |X1 , . . . , Xp ) ≡ L(η; Φ(B)Xn ), n = p + 1, . . . , N. (4.3)

The reason we do not absorb Φ in η, is to emphasize the different roles that these two parameters
have in calculating the likelihood function in (4.3). More specifically, Φ is used to transform

12
the available data {Xn }n=1,...,N, to a two-sided VARFIMA(0, D, q) series {Yn }n=1,...,N, while η is
necessary to apply the DL algorithm.
The conditional likelihood estimators of Φ and η are then given by
b ηb) = argmax L(Φ, η; Xn |X1 , . . . , Xp ),
(Φ, (4.4)
Φ,η∈S

where S = {η ∈ R6+4q : 0 < d1 , d2 < 0.5, −1 < c < 1, |Σ| = σ11 σ22 − σ122 > 0, (σ )
jj j=1,2 ≥ 0}
denotes the parameter space for η. Although there is no closed form solution for the estimates Φb
and ηb, they can be computed numerically using the quasi-Newton algorithm of Broyden, Fletcher,
Goldfarb, and Shanno (BFGS).

4.2 Forecasting
The multivariate DL algorithm used in the estimation above yields the coefficients matrices
Φn,1 , . . . , Φn,n in the 1–step-ahead forecast (predictor)

Ybn+1 := Ybn+1|n := E(Yn+1 |Y1 , . . . , Yn ) = Φn,1 Yn + . . . + Φn,n Y1 , (4.5)

as well as the associated forecast error matrix Vn = E(Yn+1 −Ybn+1 )(Yn+1 −Ybn+1 )0 . The h–step-ahead
forecasts, h ≥ 1, on the other hand, are given by
h h
Ybn+h|n := E(Yn+h |Y1 , . . . , Yn ) = Fn,1 Yn + . . . + Fn,n Y1 , (4.6)
h , k = 1, . . . , n, are 2 × 2 real-valued coefficient matrices, with the corresponding forecast
where Fn,k
error matrix
Wn+h−1|n = E(Yn+h − Ybn+h|n )(Yn+h − Ybn+h|n )0 . (4.7)
The 1-step-ahead forecasts can be used recursively by repeated conditioning to obtain recursive
h and an expression for the error matrix W
expressions for the coefficient matrices Fh,k n+h−1|n , as
stated in the next result. The standard proof is omitted for shortness sake.
h
Proposition 4.1 Let h ≥ 1 and Φn,k , n ≥ 1, k = 1, . . . , n, be as above. Then, the matrices Fn,k
in (4.6) satisfy the recursive relation
h−1
h−j
X
h
Fn,k = Φn+h−1,h+k−1 + Φn+h−1,j Fn,k , (4.8)
j=1

1 := Φ
with Fn,k n,k , n ≥ 1, k = 1, . . . , n. Moreover, the corresponding error matrices Wn+h−1|n in
relation (4.7), are given by
n
X
Wn+h−1|n = Γ(0) − h
Fn,j Γ(h + j − 1)0 , (4.9)
j=1

where Γ(n) = EYn Y00 is the autocovariance matrix function of {Yn }n∈Z .

Remark 4.1 In the time series literature (e.g. Brockwell and Davis (2009)), it is more succinct
and common to express the h–step-ahead forecasts by using the coefficient matrices appearing in
the multivariate Innovations (IN, for short) algorithm. We use the DL algorithm in both estimation
and forecasting since it is faster than the IN algorithm: the coefficient matrices Φn,k in (4.5) are
computed in O(n2 ) number of steps, whereas the computational complexity for the analogous
coefficients in the IN algorithm is O(n3 ).

13
The DL algorithm used in the forecasting procedure and formulae above is based on the assump-
tion that the autocovariance function of the time series {Yn }n∈Z can readily be computed, as for
example for the two-sided VARFIMA(0, D, q) series. We now turn our attention to the two-sided
VARFIMA(p, D, q) series {Xn }n∈Z defined through {Yn }n∈Z in (4.2). As we do not have an explicit
form of the autocovariance function of {Xn }n∈Z , it is not immediately clear how to calculate the
h–step-ahead forecasts
bn+h|n = E(Xn+h |X1 , . . . , Xn )
X
and the corresponding error matrices
fn+h−1|n = E(Xn+h − X
W bn+h|n )0 ,
bn+h|n )(Xn+h − X

n ≥ 1, h ≥ 1. In Proposition 4.2 below, we show that X bn+h|n and Wfn+h−1|n can be calculated
approximately and recursively from Ybn+h|n and Wn+h−1|n . For simplicity and since this order will
be used in the simulations and the application below, we focus on the case p = 1. However, the
proposition can be extended for larger values of p.

Proposition 4.2 Let Fn,k h , n ≥ 1, k = 1, . . . , n, be as in (4.6). Then, the h–step-ahead forecasts

bn+h|n = E(Xn+h |X1 , . . . , Xn ) satisfy


X

X b (a) + Rn+h|n ,
bn+h|n = X (4.10)
n+h|n

where
h−1
b (a) = Φh Xn +
X
Xn+h|n 1 Φs1 Ybn+h−s|n , (4.11)
s=0
h−1
X  
Rn+h|n = Φs1 E(Yn+h−s |X1 , . . . , Xn ) − E(Yn+h−s |Y1 , . . . , Yn ) . (4.12)
s=0

f (a) b (a) b (a) 0


Moreover, the error matrices W n+h−1|n = E(Xn+h − Xn+h|n )(Xn+h − Xn+h|n ) can be computed by

h−1 h−1
f (a)
X X
W n+h−1|n = Φs1 Wn+h−s−1|n (Φs1 )0 + Φs1 As,t (n + h)(Φt1 )0 , (4.13)
s=0 s,t=0
s6=t

where
n
X
h−t 0
As,t (n + h) = Γ(t − s) − Γ(h − s + k − 1)(Fn,k ) (4.14)
k=1

and Γ(n) = EYn Y00 is the autocovariance matrix function of {Yn }n∈Z .

Proof: By using the relation (4.2) recursively, we can write


h−1
X
Xn+h = Φh1 Xn + Φs1 Yn+h−s , h = 1, 2, . . . (4.15)
s=0

which implies that


h−1
X
bn+h|n = Φh1 Xn +
X Φs1 E(Yn+h−s |X1 , . . . , Xn ). (4.16)
s=0

14
Since E(Yn+h−s |Y1 , . . . , YN ) = Ybn+h−s|n , the relation (4.16) yields (4.11).
Next, we subtract (4.11) from (4.15) to get
h−1
b (a) =
X
Xn+h − Xn+h|n Φs (Yn+h−s − Ybn+h−s|n ).
s=0

f (a)
The h–step-ahead error matrix W n+h−1|n is then given by

h−1
X h−1
 X 0
f (a)
W = E Φ s
(Y − Y ) Φ t
(Y − Y )
1 n+h−s n+h−s|n 1 n+h−t n+h−t|n
b b
n+h−1|n
s=0 t=0
h−1
X h−1
X
= Φs1 Wn+h−s−1|n (Φs1 )0 + Φs1 As,t (n + h)(Φt1 )0 , (4.17)
s=0 s,t=0
s6=t

where As,t (u) = E(Yu−s − Ybu−s|n )(Yu−t − Ybu−t|n )0 . To show that As,t (u) satisfies (4.14), note that
for s, t = 0, . . . , u − n − 1, s 6= t, we have
0 0 0
EYbu−s|n Yu−t = E(E(Ybu−s|n Yu−t |Y1 , . . . , YN )) = EYbu−s|n E(Yu−t |Y1 , . . . , YN ) = EYbu−s|n Ybu−t|n .

Hence,
0 0 0 0
As,t (u) = EYu−s Yu−t − EYu−s Ybu−t|n − EYbu−s|n Yu−t + EYbu−s|n Ybu−t|n
0
= Γ(t − s) − EYu−s Ybu−t|n
X n 0
u−t−n
= Γ(t − s) − EYu−s Fn,k Yn−k+1
k=1
n
X
u−t−n 0
= Γ(t − s) − Γ(u − s − n + k − 1)(Fn,k ), (4.18)
k=1

yielding the relations (4.13)–(4.14). 


Since Xn − Φ1 Xn−1 = Yn for the VARFIMA(1, D, q) series {Xn } and the VARFIMA(0, D, q)
series {Yn }, the approximation error Rn+h|n in (4.12) becomes negligible for large n. For this reason,
in the simulations and the application below, we shall use the approximate forecasts X b (a) in (4.10)
n+h|n
f (a)
and their forecast error matrices W n+h−1|n given by (4.13).

5 Simulation study
In this section, we perform a Monte Carlo simulation study to assess the performance of the CLDL
algorithm for the VARFIMA(p, D, q) model (3.10) described in Section 4.1. We examine four
different models with AR and MA components of orders p, q = 0, 1. When either the MA or the
AR part is present, we shall consider a non-diagonal coefficient matrix. This is somewhat in contrast
to what was stated in Section 3.2 for the AR part but we sought to see what happens when the
AR part is non-diagonal as well and the results for the diagonal AR parts (not reported here) were
qualitatively similar or better. For each model, we consider three sample sizes N = 200, 400, 1000.
The Gaussian time series data are generated using the fast and exact synthesis algorithm of Helgason
et al. (2011), while the number of replications is 100.

15
To solve the maximization problem (4.4), we use the SAS/IML nlpqn function, which im-
plements the BFGS quasi-Newton method, a popular iterative optimization algorithm. For our
optimization scheme, we follow the approach found in Tsay (2010). A first step is to eliminate
the nonlinear inequality constraint |Σ| ≥ 0 in the parameter space S (defined in Section 4.1, by
letting Σ = U 0 U , where U = (Ujk )j,k=1,2 is an upper triangular matrix (Σ is nonnegative def-
inite and such a factorization always exists). Then, the parameter vector θ can be written as
θ = (d1 , d2 , c, U11 , U12 , U22 , Θ0 )0 while the parameter space becomes S = {θ ∈ R6+4q : 0 < d1 , d2 <
0.5, −1 < c < 1}.
Next, we describe our strategy on selecting initial parameter values (ΦI , θI ) for the BFGS
method. Let
Φ01 = (φ0jk,1 )j,k=1,2 , θ0 = (d01 , d02 , c0 , U11
0 0
, U12 0
, U22 , (Θ01 )0 )0 , (5.1)
where Θ01 = (θjk,1
0
)j,k=1,2 , be the true parameter values. We consider initial values
0 0
I
2d0k I 2c0 I I eθjk,1 − 1 I eφjk,1 − 1
dk = , c = , Ujk = 1, θjk,1 = 0 , φjk,1 = 0 , (5.2)
1 + 2d0k 1 + |c0 | eθjk,1 + 1 eφjk,1 + 1
where j, k = 1, 2. Note that the transformations (5.2) are essentially perturbations of the true
parameter values that also retain the range of the parameter space S. For example, the value of dIk
will be zero (or 1/2) when d0k is also zero (or 1/2). Moreover, even though the parameter space S
does not include identifiability (including stability) constraints for the elements of the AR and MA
polynomials as discussed in Section 3.2, we did not encounter any cases where the optimization
algorithm considers such values.
Table 1 and Figure 3 present estimation results for the four models considered. For all simu-
lations, we take (dropping the superscript 0 for simplicity) d1 = 0.2, d2 = 0.4, c = 0.6, Σ11 = 3,
Σ12 = 0.5, Σ22 = 3, and wherever present, Φ11 = φ11,1 = 0.5, Φ12 = φ12,1 = 0.2, Φ21 = φ21,1 = 0.4,
Φ22 = φ22,1 = −0.8, Θ11 = θ11,1 = 0.1, Θ12 = θ12,1 = −0.6, Θ21 = θ21,1 = 0.2, Θ22 = θ22,1 = 0.8.3
We also performed simulations for several other values of these parameters and got similar results
and therefore we omit them in favor of space economy. Table 1 lists the median differences between
the estimates and the corresponding true values, and the respective median absolute deviations.
Figure 3, on the other hand, includes the boxplots of the estimates for the various parameters.
While the table concerns only the results for the sample sizes N = 200 and 400, the figure also
includes the case of N = 1000.
The results in Table 1 and Figure 3 indicate a satisfactory performance of the CLDL algorithm
for most cases considered: the median differences are small overall and tend to decrease with the
increasing sample size, and the decrease with the increasing sample size is also evident for the
median deviations; moreover, many median deviations and box sizes are relatively small as well.
We also note that for some cases and smaller samples spaces, we could see bimodality in the
histograms of the corresponding estimates (not included here for shortness sake). For example, this
was the case for c, (p, q) = (1, 1) when N = 200, as suggested by the larger median difference in
Table 1. But bimodality either diminished or completely disappeared as the sample size increased.
Finally, we also comment on the model selection task concerning the one- and two-sided models
when using BIC and AIC. Figure 4, the left plot, presents the proportion of times that these informa-
tion criteria select the one-sided VARFIMA(0, D, 0) model over the two-sided VARFIMA(0, D, 0),
when in fact the latter model is true, for the same parameter values as in Table 1 in the case
p = q = 0. The right plot of the figure presents the analogous plot for a different set of values of
the parameters d1 , d2 and c. The performance of the model selection criteria is satisfactory overall.
3
For this choice of d1 , d2 , c, the phase parameter is equal to φ = 1.15. Taking c = −0.1985 with the same d’s
would yield zero phase.

16
1

0.5

-0.5
d1 d2 c U11 U12 U22 d1 d2 c U11 U12 U22 d1 d2 c U11 U12 U22

0.5

-0.5

d1 d2 c U11 U12 U22 Θ11 Θ12 Θ21 Θ22 d1 d2 c U11 U12 U22 Θ11 Θ12 Θ21 Θ22 d1 d2 c U11 U12 U22 Θ11 Θ12 Θ21 Θ22

0.5

-0.5

-1
d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22

0.5

-0.5

-1
d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 Θ11 Θ12 Θ21 Θ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 Θ11 Θ12 Θ21 Θ22 d1 d2 c U11 U12 U22 Φ11 Φ12 Φ21 Φ22 Θ11 Θ12 Θ21 Θ22

Figure 3: The red solid lines are the medians and the blue dashed lines are the true parameter values
(except for U11 , U22 which are centered at 0).Top to bottom: (p, q) = (0, 0),(0, 1),(1, 0) and (1, 1). Left to
right: N = 200, 400, 1000.

17
(p, q) (0, 0) (0, 1) (1, 0) (1, 1)
N 200 400 200 400 200 400 200 400
0.002 −0.002 −0.023 −0.015 −0.067 −0.007 −0.058 −0.017
d1
0.030 0.031 0.058 0.040 0.110 0.065 0.117 0.059
−0.017 −0.005 −0.011 −0.005 −0.022 −0.026 −0.024 −0.010
d2
0.035 0.029 0.047 0.033 0.040 0.032 0.042 0.033
0.025 0.010 0.083 0.031 0.042 0.035 0.171 0.180
c
0.059 0.041 0.119 0.065 0.066 0.048 0.101 0.101
0.024 0.009 0.059 0.013 0.067/0.009 0.031/0.010
Φ11 /Θ11
0.069 0.044 0.081 0.057 0.092/0.066 0.065/0.055
0.123 0.043 −0.022 −0.036 −0.049/0.159 −0.026/0.147
Φ12 /Θ12
0.074 0.030 0.080 0.066 0.076/0.191 0.052/0.216
0.033 0.021 0.005 0.001 0.029/0.028 0.033/0.001
Φ21 /Θ21
0.113 0.077 0.025 0.016 0.034/0.091 0.026/0.084
−0.061 −0.001 −0.002 −0.004 0.000/−0.242 0.001/−0.229
Φ22 /Θ22
0.061 0.039 0.021 0.012 0.008/0.129 0.008/0.079
−0.041 −0.028 −0.114 −0.043 −0.051 −0.051 −0.194 −0.183
U11
0.077 0.064 0.126 0.079 0.085 0.068 0.106 0.085
−0.009 0.001 0.092 0.033 −0.005 −0.008 0.053 0.068
U12
0.093 0.064 0.131 0.092 0.101 0.071 0.103 0.093
0.064 0.016 0.212 0.057 0.084 0.077 0.209 0.265
U22
0.202 0.135 0.296 0.191 0.242 0.145 0.220 0.200

Table 1: Median differences between the estimates and the corresponding true values (top value in
each cell) and median absolute deviations (bottom value in each cell) for the estimated parameters
of VARFIMA series with (p, q) = (0, 0), (0, 1), (1, 0), (1, 1).

6 Application
In this section, we apply the CLDL algorithm to analyze inflation rates in the U.S. under the
two-sided VARFIMA(p, D, q) model discussed in Section 3.2. Evidence of long-range dependence
behavior in inflation rates has been found in a number of works (see, for example, Baillie et
al. (1996), Doornik and Ooms (2004), Hurvich and Sela (2009), Baillie and Moreno (2012) and
references therein). More specifically, Hurvich and Sela (2009) tested the fit of several long- and
short-range dependent models on the annualized monthly inflation rates for goods and services
in the U.S. during the period of February 1956–January 2008 (N = 624 months) and selected a
one-sided VARFIMA model as the best choice. Besides their long memory features, however, the
time series of inflation rates often exhibit asymmetric behavior, and therefore call for multivariate
LRD models that allow for general phase.
Following the notation of Sela (2010),4 we denote the Consumer Price Indices series for com-
modities as {CP Inc }n=0,...,N and the corresponding series for services as {CP Ins }n=0,...,N . Then, we
define the annualized monthly inflation rates for goods and services as
CP Inc − CP In−1
c CP Its − CP In−1
s
gn = 1200 c and sn = 1200 s ,
CP In−1 CP In−1
respectively. The two series {gn }n=1,...,N , {sn }n=1,...,N are depicted in Figure 5.5
4
See also the accompanying R code.
5
The consumer price indices (raw) data are available online from the Bureau of Labor Statistics.

18
100 100

90 90

80 80

70 70

60 60

50 50

40 40

30 30

20 20

10 10

0 0
50 100 150 200 250 300 50 100 150 200 250 300

Figure 4: The proportion of times that the considered information criteria select the one-sided
VARFIMA(0, D, 0) model over the two-sided one, when the latter is true.

Annualized monthly inflation rate–Services


30 30
Annualized monthly inflation rate–Goods

20 20

10 10

0 0

-10 -10

-20 -20

0 100 200 300 400 500 600 0 100 200 300 400 500 600
Year Year

Figure 5: Annualized monthly inflation rates for goods (left) and services (right) from February
1956 to January 2008.

The two plots in Figure 6 provide some motivation for why a general phase model is needed for
this dataset. More specifically, the left plot in Figure 6 depicts the sample cross-correlation function
ρb12 (h) of the two series for all lags such that |h| < 25. Observe that for negative lags the sample
cross-correlation function decays faster than for positive lags suggesting time-non-reversibility of
the series and hence non-zero phase.
Further evidence for general phase can be obtained from the local Whittle estimation of Robin-
son (2008) which can be used to estimate the phase and the LRD parameters directly from the data.
The estimation is semiparametric in the sense that it only requires specification of the spectral den-
sity at low frequencies. The right plot in Figure 6 depicts two local Whittle estimates of the phase
parameter φ as functions of m – a tuning parameter representing the number of lower frequencies
used in the estimation. The dashed line corresponds to the special phase estimate φb = (db1 − db2 )π/2
of the one-sided VARFIMA model based on the local Whittle estimates of the two d’s. On the
other hand, the solid line shows the phase parameter estimated directly from the data. The two
lines being visibly different suggest that the special phase parameter and the associated VARFIMA
model are not appropriate. A more detailed local Whittle analysis of the dataset can be found in
Baek et al. (2018).

19
0.5 0.5
W hit t le ( dat a dr ive n)
Mode l ( s p e c ial phas e )
0.4

0.3
γ12 (h)

φb
0.2 −0.5

0.1

−1

-0.1 −1.5
-25 -20 -15 -10 -5 0 5 10 15 20 25 50 100 150 200 250 300 350 400 450
lag m

Figure 6: Left plot: Sample cross-correlation ρb12 (h) of the series {gn }n=1,...,N and {sn }n=1,...,N
depicted in Figure 5 for |h| ≤ 25. Right plot: Local Whittle phase estimates, one corresponding to
the one-sided VARFIMA (dashed line) and one estimated directly from the data (solid line). Both
estimates are plotted as functions of a tuning parameter m = N 0.25+0.0125k , k = 1, . . . , 51, where
N = 624 is the sample size of the series.

In the analysis of Sela and Hurvich (2009), the one-sided VARFIMA(1, D, 0) model was selected
as the best choice (based on AIC), amongst vector autoregressive models of both low and high orders
and also amongst one-sided VARFIMA(p, D, 0) and FIVARMA(p, D, 0) models with p ≤ 1. The
estimated VARFIMA(1, D, 0) model, in particular, was

gn = 0.3027gn−1 + 0.4245sn−1 + 1,n ,


(6.1)
sn = −0.0237gn−1 − 0.3085sn−1 + 2,n ,

where     
1,n 20.23 0.46
∼ N 0, . (6.2)
(I − B)0.4835 2,n 0.46 7.08
We should note here that the SAS optimization algorithm we used produced estimates similar to
those of Sela’s algorithm (implemented in R), for all parameters except d1 which Sela estimates to
be zero while for this model we estimated it to be 0.191. More specifically, the estimated one-sided
VARFIMA(1, D, 0) model is

gn = 0.132 (0.064) gn−1 + 0.076 (0.080) sn−1 + 1,n ,


(6.3)
sn = 0.056 (0.023) gn−1 − 0.308 (0.044) sn−1 + 2,n ,

where
(I − B)0.191 (0.052) 1,n
    
20.21 0.75
∼ N 0, (6.4)
(I − B)0.475 (0.062) 2,n 0.75 7.02
and the underlying U in Σ = U 0 U has U11 = 4.605 (0.131), U12 = 0.164 (0.109) and U22 =
2.645 (0.075), with standard errors of the estimates added in the parentheses throughout.
The parameter estimates in (6.1)–(6.2) reveal an interesting feature, noted by Sela (2010).
In particular, while the lagged services inflation has a significant influence on goods inflation, the
lagged goods inflation seems to have a small effect on services inflation. This behavior is potentially
related to the so-called gap between the prices in services and the prices in goods which was studied
by Peach et al. (2004). More specifically, the term gap refers to the tendency of prices in services

20
to increase faster than prices in goods. But note that this effect is not present in the estimated
model (6.3)–(6.4) and, in fact, is reversed.
For comparison, we present next two estimated two-sided VARFIMA(1, D, 0) models. The
estimated two-sided VARFIMA(1, D, 0) model with a diagonal AR component is

gn = 0.106 (0.053) gn−1 + 1,n ,


(6.5)
sn = −0.439 (0.080) sn−1 + 2,n ,

where
((I − B)−0.231 (0.043) + 0.344 (0.158)(I − B −1 )−0.231 )−1 1,n
    
12.141 0.948
∼ N 0,
((I − B)−0.439 (0.044) − 0.344(I − B −1 )−0.439 )−1 2,n 0.948 11.680
(6.6)
and the corresponding U in Σ = U 0 U has U11 = 3.484 (0.408), U12 = 0.272 (0.154) and U22 =
3.407 (0.471). The estimated two-sided VARFIMA(1, D, 0) model with general AR component is

gn = 0.139 (0.061) gn−1 + 0.100 (0.084) sn−1 + 1,n ,


(6.7)
sn = 0.046 (0.023) gn−1 − 0.416 (0.087) sn−1 + 2,n ,

where
((I − B)−0.190 (0.056) + 0.315 (0.186)(I − B −1 )−0.190 )−1 1,n
    
12.528 0.955
∼ N 0,
((I − B)−0.437 (0.046) − 0.315(I − B −1 )−0.437 )−1 2,n 0.955 11.130
(6.8)
0
and the corresponding U in Σ = U U has U11 = 3.539 (0.499), U12 = 0.270 (0.167) and U22 =
3.326 (0.524). In terms of AIC, listed from smallest to largest, the estimated models are (6.7)–
(6.8), (6.3)–(6.4) and (6.5)–(6.6). In terms of BIC, the models are (6.5)–(6.6), (6.3)–(6.4) and
(6.7)–(6.8). In either case, a two-sided model is preferred to the one-sided model (6.3)–(6.4). We
also note that the models with the smallest AIC and BIC in the class of one- and two-sided models
(with the orders up to 1) are VARFIMA(1, D, 1). We focus on VARFIMA(1, D, 0) for shortness
sake and comparison to an earlier work by Sela.
We have a number of interesting observations related to the estimated parameters in (6.5)–(6.8).
First, note that the off-diagonal elements of the AR component in (6.7) are close to 0. This suggests
that the (a)symmetry behavior in the inflation rates is now captured by the parameter c. Indeed,
plugging the estimates of d1 , d2 and c from the models (6.5)–(6.6) and (6.7)–(6.8) in relation (2.9),
we obtain phase estimates around −0.85 which agree with the local Whittle estimate of the phase
in the right plot of Figure 6. Note that the local Whittle estimates of d1 and d2 which we did not
report here are close to 0.2 and 0.38 respectively, and hence they are also close to the estimated
parameters of the two two-sided VARFIMA models.
Second, related to the “gap” phenomenon discussed briefly following (6.4), we also note that as
with (6.3)–(6.4), this effects is not present and, in fact, reversed in the model (6.7)–(6.8) in the sense
that the AR coefficient φ12 is non-significant and the AR coefficient φ21 is (marginally) significant.
We note that the latter makes sense from the perspective of the estimated LRD parameters in the
following sense. As estimated, the driving noise series {1,n } and {2,n } have respective memory
parameters 0.190 and 0.437. Assuming φ12 = 0 allows the series {gn } to have the smaller memory
parameter 0.190, whereas {sn } has the larger memory parameter 0.437. If φ12 were significant
(non-zero), then the series {gn } would be forced to have the larger memory parameter 0.437 as
well, which is not that observed through the local Whittle estimation as noted above.
We conclude this section with a few comments regarding forecasting. Figure 7 presents root-
mean-square (out-of-sample) forecasting errors over the horizons h = 1, . . . , 25, obtained by fitting

21
the indicated one- and two-sided models over a number of sliding windows (200 or 400, resp.) of
varying size (400 and 200, resp.). The two-sided models generally outperform or are comparable
to their one-sided counterparts. We included these forecasts for the sake of completeness, realizing
that assessing the role of LRD in forecasting with ARFIMA models is known to be quite delicate
even in the univariate case (e.g. Ray (1993), Ellis and Wilson (2004)).

10 7.5

7.4

9.5
7.3

7.2
9

7.1

8.5
7

6.9
8

6.8

7.5 6.7
0 5 10 15 20 25 0 5 10 15 20 25

9.4 7.5

9.2
7.4

7.3
8.8

7.2
8.6

8.4
7.1

8.2
7

6.9
7.8

7.6 6.8
0 5 10 15 20 25 0 5 10 15 20 25

Figure 7: Root-mean-square (out-of-sample) forecasting errors over the horizons h = 1, . . . , 25,


obtained by fitting the indicated one- and two-sided models over a number of sliding windows (200
or 400, resp.) of varying size (400 and 200, resp.).

References
Baek, C., Kechagias, S. & Pipiras, V. (2018), Asymptotics of bivariate local Whittle estimators
with applications to fractal connectivity, Preprint.

Baillie, R. T. & Morana, C. (2012), ‘Adaptive ARFIMA models with applications to inflation’,
Economic Modelling 29(6), 2451–2459.

Baillie, R. T., Chung, C.-F. & Tieslau, M. A. (1996), ‘Analysing inflation by the fractionally
integrated ARFIMA-GARCH model’, Journal of applied econometrics 11(1), 23–40.

Beran, J., Feng, Y., Ghosh, S. & Kulik, R. (2013), Long-Memory Processes, Springer, Heidelberg.

Brockwell, P. J. & Davis, R. A. (2009), Time Series: Theory and Methods, Springer Series in
Statistics, Springer, New York. Reprint of the second (1991) edition.

22
Diongue, A. K. (2010), ‘A multivariate generalized long memory model’, Comptes Rendus Mathe-
matique 348(5), 327–330.

Doornik, J. A. & Ooms, M. (2004), ‘Inference and forecasting for ARFIMA models with an appli-
cation to US and UK inflation’, Studies in Nonlinear Dynamics & Econometrics.

Doukhan, P., Oppenheim, G. & Taqqu, M. S. (2003), Theory and Applications of Long-Range
Dependence, Birkhäuser Boston Inc., Boston, MA.

Dueker, M. & Startz, R. (1998), ‘Maximum-likelihood estimation of fractional cointegration with an


application to US and Canadian bond rates’, Review of Economics and Statistics 80(3), 420–
426.

Dufour, J.-M. & Pelletier, D. (2014), ‘Practical methods for modeling weak VARMA processes:
identification, estimation and specification with a macroeconomic application’, Preprint.

Ellis, C. & Wilson, P. (2004), ‘Another look at the forecast performance of ARFIMA models’,
International Review of Financial Analysis 13(1), 63–81.

Giraitis, L., Koul, H. L. & Surgailis, D. (2012), Large Sample Inference for Long Memory Processes,
Imperial College Press, London.

Helgason, H., Pipiras, V. & Abry, P. (2011), ‘Fast and exact synthesis of stationary multivariate
Gaussian time series using circulant embedding’, Signal Processing 91(5), 1123–1133.

Kechagias, S. & Pipiras, V. (2015), ‘Definitions and representations of multivariate long-range


dependent time series’, Journal of Time Series Analysis 36(1), 1–25.

Lobato, I. N. (1997), ‘Consistency of the averaged cross-periodogram in long memory series’, Journal
of Time Series Analysis 18(2), 137–155.

Martin, V. L. & Wilkins, N. P. (1999), ‘Indirect estimation of ARFIMA and VARFIMA models’,
Journal of Econometrics 93(1), 149–175.

Pai, J. & Ravishanker, N. (2009a), ‘Maximum likelihood estimation in vector long memory processes
via em algorithm’, Computational Statistics & Data Analysis 53(12), 4133–4142.

Pai, J. & Ravishanker, N. (2009b), ‘A multivariate preconditioned conjugate gradient approach


for maximum likelihood estimation in vector long memory processes’, Statistics & Probability
Letters 79(9), 1282–1289.

Palma, W. (2007), Long-Memory Time Series, John Wiley & Sons, Inc., Hoboken, New Jersey,
USA.

Park, K. & Willinger, W. (2000), Self-Similar Network Traffic and Performance Evaluation, Wiley
Online Library.

Peach, R., Rich, R. & Antoniades, A. (2004), ‘The historical and recent behaviour of goods and
services inflation’, Economic Policy Review pp. 19–31.

Pipiras, V. & Taqqu, M. S. (2017), Long-Range Dependence and Self-Similarity, Cambridge Uni-
versity Press, Cambridge.

23
Ravishanker, N. & Ray, B. K. (1997), ‘Bayesian analysis of vector ARFIMA processes’, Australian
Journal of Statistics 39(3), 295–311.

Ray, B. K. (1993), ‘Modeling long-memory processes for optimal long-range prediction’, Journal of
Time Series Analysis 14(5), 511–525.

Robinson, P. M. (2003), Time Series with Long Memory, Advanced Texts in Econometrics, Oxford
University Press, Oxford.

Robinson, P. M. (2008), ‘Multiple local Whittle estimation in stationary systems’, The Annals of
Statistics 36(5), 2508–2530.

Sela, R. J. (2010), Three essays in econometrics: multivariate long memory time series and applying
regression trees to longitudinal data, PhD thesis, New York University.

Sela, R. J. & Hurvich, C. M. (2009), ‘Computationally efficient methods for two multivariate
fractionally integrated models’, Journal of Time Series Analysis 30(6), 631–651.

Sowell, F. (1986), ‘Fractionally integrated vector time series’, PhD thesis, Duke University.

Tsay, W.-J. (2010), ‘Maximum likelihood estimation of stationary multivariate ARFIMA processes’,
Journal of Statistical Computation and Simulation 80(7), 729–745.

24

You might also like