Friday, 18 August 2017

The Final Polish

We are preparing for releasing version 3.0.0 of the interval package and this last week have mainly been about fixing minor bugs related to the release. I mention two of the more interesting bugs here.

Compact Format

We (Oliver) recently added support for "format compact" when printing intervals. It turns out that the way to determine if compact format is enabled differs very much between different version of Octave. There are at least three different ways to get the information.

In the older releases (< 4.2.0 I believe) you use "get (0, "FormatSpacing")" but there appear to be a bug for version < 4.0.0 for which this always return "loose".

For the current tip of the development branch you can use "[~, spacing] = format ()" to get the spacing.

Finally in between these two version you use "__compactformat__ ()".

In the end Oliver, probably, found a way to handle this mess and compact format should now be fully supported for intervals. The function to do this is available here https://sourceforge.net/p/octave/interval/ci/default/tree/inst/@infsup/private/__loosespacing__.m.

Dot-product of Empty Matrices

When updating "dot" to support N-dimensional arrays I also modified it so that it behaves similar to Octaves standard implementation. The difference is in how it handles empty input. Previously we had

> x = infsupdec (ones (0, 2));
> dot (x, x)
ans = 0×2 interval matrix

but with the new version we get

> dot (x, x)
ans = 1×2 interval vector
   [0]_com   [0]_com

which is consistent with the standard implementation.

In the function we use "min" to compute the decoration for the result. Normally "min (x)" and "dot (x, x)" returns results of the same size (the dimension along which it is computed is set to 1), but they handle empty input differently. We have

> x = ones (0, 2);
> dot (x, x)
ans =
   0   0
> min (x)
ans = [](0x2)

This meant that the decoration would be incorrect since the implementation assumed they always had the same size. Fortunately the solution was very simple. If the dimension along which we are computing the dot-product is zero. the decoration should always be "com". So just adding a check for that was enough.

You could argue that "min (ones (0, 2))" should return "[inf, inf]" similarly to how many of the other reductions, like "sum" or "prod", return their unit for empty input. But this would most likely be very confusing for a lot of people. And it is not compatible with how Matlab does it either.

Updates on the Taylor Package

I have also had some time to work on the Taylor package this week. The basic utility functions are now completed and I have started to work on functions for actually computing with Taylor expansions. At the moment there are only a limited amount of functions implemented. For example we can calculate the Taylor expansion of order 4 for the functions $\frac{e^x + \log(x)}{1 + x}$ at $x = 5$.

## Create a variable of degree 4 and with value 5
> x = taylor (infsupdec (5), 4)
x = [5]_com + [1]_com X + [0]_com X^2 + [0]_com X^3 + [0]_com X^4

## Calculate the function
> (exp (x) + log (x))./(1 + x)
ans = [25.003, 25.004]_com + [20.601, 20.602]_com X + [8.9308, 8.9309]_com X^2 + [2.6345, 2.6346]_com X^3 + [0.59148, 0.59149]_com X^4



Friday, 11 August 2017

Improving the Automatic Tests

Oliver and I have been working on improving the test framework used for the interval package. The package shares a large number of tests with other interval packages through an interval test framework that Oliver created. Here is the repository.

Creating the Tests

Previously these tests were separated from the rest of the package and you usually ran them with help of the Makefile. Now Oliver has moved them to the m-files and you can run them, together with the other tests for the function, with test @infsup/function in Octave. This makes it much easier to test the functions directly.

In addition to making the tests easier to use we also wanted to extend them to not only test scalar evaluation but also vector evaluation. The test data, input ad expected output, is stored in a cell array and when performing the scalar testing we simply loop over that cell and run the function for each element. The actual code looks like this (in this case for plus)

%!test
%! # Scalar evaluation
%! testcases = testdata.NoSignal.infsup.add;
%! for testcase = [testcases]'
%!   assert (isequaln (...
%!     plus (testcase.in{1}, testcase.in{2}), ...
%!     testcase.out));
%! endfor

For testing the vector evaluation we simply concatenate the cell array into a vector and give that to the function. Here is what that code looks like

%! # Vector evaluation
%! testcases = testdata.NoSignal.infsup.add;
%! in1 = vertcat (vertcat (testcases.in){:, 1});
%! in2 = vertcat (vertcat (testcases.in){:, 2});
%! out = vertcat (testcases.out);
%! assert (isequaln (plus (in1, in2), out));

Lastly we also wanted to test evaluation of N-dimensional arrays. This is done by concatenating the data into a vector and then reshape that vector into an N-dimensional array. But what size should we use for the array? Well, we want to have at least three dimensions because otherwise we are not really testing N-dimensional arrays. My solution was to completely factor the length of the vector and use that as size, testsize = factor (length (in1)), and if the length of the vector has two or fewer factors we add a few elements to the end until we get at least three factors. This is the code for that

%!test
%! # N-dimensional array evaluation
%! testcases = testdata.NoSignal.infsup.add;
%! in1 = vertcat (vertcat (testcases.in){:, 1});
%! in2 = vertcat (vertcat (testcases.in){:, 2});
%! out = vertcat (testcases.out);
%! # Reshape data
%! i = -1;
%! do
%!   i = i + 1;
%!   testsize = factor (numel (in1) + i);
%! until (numel (testsize) > 2)
%! in1 = reshape ([in1; in1(1:i)], testsize);
%! in2 = reshape ([in2; in2(1:i)], testsize);
%! out = reshape ([out; out(1:i)], testsize);
%! assert (isequaln (plus (in1, in2), out));

This works very well, except when the number of test cases is to small. If the number of test is less than four this will fail. But there are only a handful of functions with that few tests so I fixed those independently.

Running the tests

Okay, so we have created a bunch of new tests for the package. Do we actually find any new bugs with them? Yes!

The function pow.m failed on the vector test. The problem? In one place $\&\&$ was used instead of $\&$. For scalar input I believe these behave the same but they differ for vector input.

Both the function nthroot.m and the function pownrev.m failed the vector test. Neither allowed vectorization of the integer parameter. For nthroot.m this is the same for standard Octave version so it should perhaps not be treated as a bug. The function pownrev.m uses nthroot.m internally so it also had the same limitation. This time I would however treat it as a bug because the function pown.m does allow vectorization of the integer parameter and if that supports it the reverse function should probably also do it. So I implemented support for vectorization of the integer parameter for both nthroot.m and pownrev.m and they now pass the test.

No problems were found with the N-dimensional tests that the vector tests did not find. This is a good indication that the support for N-dimensional arrays is at least partly correct. Always good to know!

Friday, 28 July 2017

A Package for Taylor Arithmetic

In the last blog post I wrote about what was left to do with implementing support for N-dimensional arrays in the interval package. There are still some things to do but I have had, and most likely will have, some time to work on other things. Before the summer I started to work on a proof of concept implementation of Taylor arithmetic in Octave and this week I have continued to work on that. This blog post will be about that.

A Short Introduction to Taylor Arithmetic

Taylor arithmetic is a way to calculate with truncated Taylor expansions of functions. The main benefit is that it can be used to calculate derivatives of arbitrary order.

Taylor expansion or Taylor series (I will use these words interchangeably) are well known and from Wikipedia we have: The Taylor series of real or complex valued function $f(x)$ that is infinitely differentiable at a real or complex number $a$ is the power series
$$
f(a) + \frac{f'(a)}{1!}(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \frac{f'''(a)}{3!}(x-a)^3 + ....
$$
From the definition it is clear that if we happen to know the coefficients of the Taylor series of $f$ at the point $a$ we can also calculate all derivatives of $f$ at that point by simply multiplying a coefficient with the corresponding factorial.

The simplest example of Taylor arithmetic is addition of two Taylor series. If $f$ has the Taylor series $\sum_{n=0}^\infty (f)_n (x-a)^n$ and $g$ the Taylor series $\sum_{n=0}^\infty (g)_n (x-a)^n$ then $f + g$ will have the Taylor series
$$
\sum_{n=0}^\infty (f + g)_n (x-a)^n = \sum_{n=0}^\infty ((f)_n + (g)_n)(x-a)^n$
$$
If we instead consider the product, $fg$, we get
$$
\sum_{n=0}^\infty (fg)_n (x-a)^n = \sum_{n=0}^\infty \left(\sum_{i=0}^n (f)_n(g)_n\right)(x-a)^n.
$$

With a bit of work you can find similar formulas for other standard functions. For example the coefficients, $(e^f)_n$, of the Taylor expansion of $\exp(f)$ is given by $(e^f)_0 = e^{(f)_0}$ and for $n > 0$
$$
(e^f)_n = \frac{1}{n}\sum_{i=0}^{n-1}(k-j)(e^f)_i(f)_{n-i}.
$$

When doing the computations on a computer we consider truncated Taylor series, we choose an order and keep only coefficients up to that order. There is also nothing that stops us from using intervals as coefficients, this allows us to get rigorous enclosures of derivatives of functions.

For a more complete introduction to Taylor arithmetic in conjunction with interval arithmetic see [1] which was my first encounter to it. For another implementation of it in code take a look at [2].

Current Implementation Status

As mentioned in the last post my repository can be found here

When I started to write on the package, before summer, my main goal was to get something working quickly. Thus I implemented the basic functions needed to do some kind of Taylor arithmetic, a constructor, some help functions and a few functions like $\exp$ and $\sin$.

This last week I have focused on implementing the basic utility functions, for example $size$, and rewriting the constructor. In the process I think I have broken the arithmetic functions, I will fix them later.

You can at least create and display Taylor expansions now. For example creating a variable $x$ with value 5 of order 3

> x = taylor (infsupdec (5), 3)
x = [5]_com + [1]_com X + [0]_com X^2 + [0]_com X^3

or a matrix with 4 variables of order 2

> X = taylor (infsupdec ([1, 2; 3, 4]), 2)
X = 2×2 Taylor matrix of order 2

ans(:,1) =

   [1]_com + [1]_com X + [0]_com X^2
   [3]_com + [1]_com X + [0]_com X^2

ans(:,2) =

   [2]_com + [1]_com X + [0]_com X^2
   [4]_com + [1]_com X + [0]_com X^2

If you want you can create a Taylor expansion with explicitly given coefficients you can do that as well

> f = taylor (infsupdec ([1; -2; 3, -4))
f = [1]_com + [-2]_com X + [3]_com X^2 + [-4]_com X^3

This would represent a function $f$ with $f(a) = 1$, $f'(a) = -2$, $f''(a) = 3 \cdot 2! = 6$ and $f'''(a) = -4 \cdot 3! = -24$.

Creating a Package

My goal is to create a full package for Taylor arithmetic along with some functions making use of it. The most important step is of course to create a working implementation but there are other things to consider as well. I have a few things I have not completely understood about it. Depending on how much time I have next week I will try to read a bit more about it probably ask some questions on the mailing list. Here are at least some of the things I have been thinking about

Mercurial vs Git?

I have understood that most of the Octave forge packages uses Mercurial for version control. I was not familiar with Mercurial before so the natural choice for me was to use Git. Now I feel I could switch to Mercurial if needed but I would like to know the potential benefits better, I'm still new to Mercurial so I don't have the full picture. One benefit is of course that it is easier if most  packages use the same system, but other than that?

How much work is it?

If I were to manage a package for Taylor arithmetic how much work is it? This summer I have been working full time with Octave so I have had lots of time but this will of course not always be the case. I know it takes time if I want to continue to improve the package, but how much, and what, continuous work is there?

What is needed besides the implementation?

From what I have understood there are a couple of things that should be included in a package besides the actual m-files. For example a Makefile for creating the release, an INDEX-file and a CITATION-file. I should probably also include some kind of documentation, especially since Taylor arithmetic is not that well known. Is there anything else I need to think about?

What is the process to get a package approved?

If I were to apply (whatever that means) for the package to go to Octave forge what is the process for that? What is required before it can be approved and what is required after it is approved?


[1] W. Tucker, Validated Numerics, Princeton University Press, 2011.
[2] F. Blomquist, W. Hofschuster, W. Krämer, Real and complex taylor arithmetic in C-XSC, Preprint 2005/4, Bergische Universität Wuppertal.

Friday, 14 July 2017

Ahead of the Timeline

One of my first posts on this blog was a timeline for my work during the project. Predicting the amount of time something takes is always hard to do. Often time you tend to underestimate the complexity of parts of the work. This time however I overestimated the time the work would take.

If my timeline would have been correct I would have just started to work on Folding Functions (or reductions as they are often called). Instead I have completed the work on them and also for functions regarding plotting. In addition I have started to work on the documentation for the package as well as checking everything an extra time.

In this blog post I will go through what I have done this week, what I think is left to do and a little bit about what I might do if I complete the work on N-dimensional arrays in good time.

This Week

The Dot Function

The $dot$-function was the last function left to implement support for N-dimensional arrays in. It is very similar to the $sum$-function so I already had an idea of how to do it. As with  $sum$ I moved most of the handling of the vectorization from the m-files to the oct-file, the main reason being improved performance.

The $dot$-functions for intervals is actually a bit different from the standard one. First of all it supports vectorization which the standard one does not

> dot ([1, 2, 3; 4, 5, 6], 5)
error: dot: size of X and Y must match
> dot (infsupdec ([1, 2, 3; 4, 5, 6], 5)
ans = 1x3 interval vector

  [25]_com   [35]_com   [45]_com

It also treats empty arrays a little different, see bug #51333,

> dot ([], [])
ans = [](1x0)
> dot (infsupdec ([]), [])
ans = [0]_com



Package Documentation

I have done the minimal required changes to the documentation. That is I moved support for N-dimensional arrays from Limitation to Features and added some simple examples on how to create N-dimensional arrays.

Searching for Misses

During the work I have tried to update the documentation for all functions to account for the support of N-dimensional arrays and I have also tried to update some of the comments for the code. But as always, especially when working with a lot of files, you miss things, both in the documentation and old comments.

I did a quick grep for the words "matrix" and "matrices" since they are candidates for being changed to "array". Doing this I found 35 files where I missed things. It was mainly minor things, comments using the "matrix" which I now changed to "array", but also some documentation which I had forgotten to update.

What is Left?

Package Documentation - Examples

As mentioned above I have done the minimal required changes to the documentation. It would be very nice to add some more interesting example using N-dimensional arrays of intervals in a useful way. Ironically I have not been able to come up with an interesting example but I will continue to think about it. If you have an example that you think would be interesting and want to share, please let me know!

Coding Style

As I mentioned in one of the first blog posts, the coding style for the interval package was not following the standard for Octave. During my work I have adapted all files I have worked with to the coding standard for Octave. A lot of the files I have not needed to do any changes to, so they are still using the old style. It would probably be a good idea to update them as well.

Testing - ITF1788

The interval framwork libary developed by Oliver is used to test the correctness of many of the functions in the package. At the moment it tests evaluation of scalars but in principle it should be no problem to use it for testing vectorization or even broadcasting. Oliver has already started to work on this.

After N-dimensional arrays?

If I continue at this pace I will finish the work on N-dimensional arrays before the time of the project is over. Of course the things that are left might take longer than expected, they usually do, but there is a chance that I will have time left after everything is done. So what should I do then? There are more thing that can be done on the interval package, for example adding more examples to the documentation, however I think I would like to start working on a new package for Taylor arithmetics.

Before GSoC I started to implement a proof of concept for Taylor arithmetics in Ocatve which can be found here. I would then start to work on implementing a proper version of it, where I would actually make use of N-dimensional interval arrays. If I want to create a package for this I would also need to learn a lot of other things, one of them being how to manage a package on octave forge.

At the moment I will try to finish my work on N-dimensional arrays. Then I can discuss it with Oliver and see what he thinks about it.

Friday, 7 July 2017

Set inversion with fsolve

This week my work have mainly been focused on the interval version of fsolve. I was not sure if and how this could make use of N-dimensional arrays and to find that out I had to understand the function. In the end it turned out that the only generalization that could be done were trivial and required very few changes. However I did find some other problems with the functions that I have been able to fix. Connected to fsolve are the functions ctc_intersect and ctc_union. The also needed only minor changes to allow for N-dimensional input. I will start by giving an introduction to fsolve, ctc_union and ctc_intersect and then I will mention the changes I have done to them.

Introduction to fsolve

The standard version of fsolve in Octave is used to solve systems of nonlinear equations. That is, given a functions $f$ and a starting point $x_0$ it returns a value $x$ such that $f(x)$ is close to zero. The interval version of fsolve does much more than this. It is used to enclose the preimage of a set $Y$ under $f$. Given a domain $X$, a set $Y$ and a function $f$ it returns an enclosure of the set
$$
f^{-1}(Y) = \{x \in X: f(x) \in Y\}.
$$
By letting $Y = 0$ we get similar functionality as the standard fsolve, with the difference that the output is an enclosure of all zeros to the function (compared to one point for which $f$ returns close to zero).

Example: The Unit Circle

Consider the function $f(x, y) = \sqrt{x^2 + y^2} - 1$ which is zero exactly on the unit circle. Plugging this in to the standard fsolve we get with $(0.5, 0.5)$ as a starting guess

> x = fsolve (@(x) f(x(1), x(2)), [0.5, 0.5])
x = 0.70711 0.70711

which indeed is close to a zero. But we get no information about other zeros.

Using the interval version of fsolve with $X = [-3, 3] \times [-3, 3]$ as starting domain we get

> [x paving] = fsolve (f, infsup ([-3, -3], [3, 3]));
> x
x ⊂ 2×1 interval vector

     [-1.002, +1.002]
   [-1.0079, +1.0079]

Plotting the paving we get the picture

which indeed is a good enclosure of the unit circle.

How it works

In its simplest form fsolve uses a simple bisection scheme to find the enclosure. Using interval methods we can find enclosure to images of sets. Given a set $X_0 \subset X$ we have three different possibilities
  • $f(X_0) \subset Y$ in which case we add $X_0$ to the paving
  • $f(X_0) \cap Y = \emptyset$ in which case we discard $X_0$
  • Otherwise we bisect $X_0$ and continue on the parts
By setting a tolerance of when to stop bisecting boxes we get the algorithm to terminate in a finite number of steps.

Contractors

Using bisection is not always very efficient, especially when the domain has many dimensions. One way to speed up the convergence is with what's called contractors. In short a contractor is a function that can take the set $X_0$ and returns a set $X_0' \subset X_0$ with the property that $f(X_0 \setminus X_0') \cap Y = \emptyset$. It's a way of making $X_0$ smaller without having to bisect it that many times.

When you construct a contractor you use the reverse operations definer on intervals. I will not go into how this works, if you are interested you can find more information in the package documentation [1] and in these youtube videos about Set Inversion Via Interval Analysis (SIVIA) [2].

The functions ctc_union and ctc_intersect are used to combine contractors on sets into contractors on unions or intersections of these sets.

Generalization to N-dimensional arrays

How can fsolve be generalized to N-dimensional arrays? The only natural thing to do is to allow for the input and output of $f$ to be N-dimensional arrays. This also is no problem to do. While you mathematically probably would say that fsolve is used to do set inversion for functions $f: \mathbb{R}^n \to \mathbb{R}^m$ it can of course also be used for example on functions $f: \mathbb{R}^{n_1}\times \mathbb{R}^{n_2} \to \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$.

This is however a bit different when using vectorization. When not using vectorization (and not using contractions) fsolve expects that the functions takes one argument which is an array with each element corresponding to a variable. If vectorization is used it instead assumes that the functions takes one argument for each variable. Every argument is then given as a vector with each element corresponding to one value of the variable for which to compute the function. Here we have no use of N-dimensional arrays.

Modifications

The only change in functionality that I have done to the functions is to allow for N-dimensional arrays as input and output when vectorization is not used. This required only minor changes, essentially changing expressions like
max (max (wid (interval)))

to
max (wid (interval)(:))

It was also enough to do these changes in ctc_union and ctc_intersect to have these support N-dimensional arrays.

I have made no functional changes when vectorization is used. I have however made an optimization in the construction of the arguments to the function. The arguments are stored in an array but before being given to the function they need to be split up into the different variables. This is done by creating a cell array with each element being a vector with the values of one of the variables. Previously the construction of this cell array was very inefficient, it split the interval into its lower and upper part and then called the constructor to create an interval again. Now it copies the intervals into the cell without having to call the constructor. This actually seems have been quite a big improvement, using the old version the example with the unit circle from above took around 0.129 seconds and with the new version it takes about 0.092 seconds. This is of course only one benchmark, but a speed up of about 40% for this test is promising!

Lastly I noticed a problem in the example used in the documentation of the function. The function used is

# Solve x1 ^ 2 + x2 ^ 2 = 1 for -3 ≤ x1, x2 ≤ 3 again,
# but now contractions speed up the algorithm.
function [fval, cx1, cx2] = f (y, x1, x2)
  # Forward evaluation
  x1_sqr = x1 .^ 2;
  x2_sqr = x2 .^ 2;
  fval = hypot (x1, x2);
  # Reverse evaluation and contraction
  y = intersect (y, fval);
  # Contract the squares
  x1_sqr = intersect (x1_sqr, y - x2_sqr);
  x2_sqr = intersect (x2_sqr, y - x1_sqr);
  # Contract the parameters
  cx1 = sqrrev (x1_sqr, x1);
  cx2 = sqrrev (x2_sqr, x2);
endfunction

Do you see the problem? I think it took me more than a day to realize that the problems I was having was not because of a bug in fsolve but because this function computes the wrong thing. The function is supposed to be $f(x_1, x_2) = x_1^2 + x_2^2$ but when calculating the value it calls hypot which is given by $hypot(x_1, x_2) = \sqrt{x_1^2 + x_2^2}$. For $f(x_1, x_2) = 1$, which is used in the example, it gives the same result, but otherwise it will of course not work.

[1] https://octave.sourceforge.io/interval/package_doc/Parameter-Estimation.html#Parameter-Estimation
[2] https://www.youtube.com/watch?v=kxYh2cXHdNQ


Friday, 30 June 2017

One Month In

Now one month of GSoC has passed and so far everything has gone much better than I expected! According to my timeline this week would have been the first of two were I work on vectorization. Instead I have already mostly finished the vectorization and have started to work on other things. In this blog post I'll give a summary of what work I have completed and what I have left to do. I'll structure it according to where the functions are listed in the $INDEX$-file [1]. The number after the heading is the number of functions in that category.

Since this will mainly be a list of which files have been modified and which are left to do this might not be very interesting if you are not familiar with the structure of the interval package.

Interval constant (3)

All of these have been modified to support N-dimensional arrays.

Interval constructor (5)

All of these have been modified to support N-dimensional arrays.

Interval function (most with tightest accuracy) (63)

Almost all of these functions worked out of the box! At least after the API functions to the MPFR and crlibm libraries were fixed, they are further down in the list.

The only function that did not work immediately were $linspace$. Even though this function could be generalized to N-dimensinal arrays the standard Octave function only works for matrices (I think the Matlab version only allows scalars). This means that adding support for N-dimensional vectors for the interval version is not a priority. I might do it later on but it is not necessary.

Interval matrix operation (16)

Most of the matrix functions does not make sense for N-dimensional arrays. For example matrix multiplication and matrix inversion only makes sense for matrices. However all of the reduction functions are also here, they include $dot$, $prod$, $sum$, $sumabs$ and $sumsq$.

At the moment I have implemented support for N-dimensional arrays for $sum$, $sumabs$ and $prod$. The functions $dot$ and $sumsq$ are not ready, I'm waiting to see what happens with bug #51333 [2] before I continue with that work. Depending on the bug I might also have to modify the behaviour of $sum$, $sumabs$ and $prod$ slightly.

Interval comparison (19)

All of these have been modified to support N-dimensional arrays.

Set operation (7)

All of these functions have been modified to support N-dimensional arrays except one, $mince$. The function $mince$ is an interval version of $linspace$ and reasoning here is the same as that for $linspace$ above.

Interval reverse operation (12)

Like the interval functions above, all of the functions worked out of the box!

Interval numeric function (11)

Also these functions worked out of the box, with some small modifications to the documentation for some of them.

Interval input and output (9)

Here there are some functions which require some comments, the ones I do not comment about have all gotten support for N-dimensional arrays.

$interval\_bitpack$
I think that this function does not make sense to generalize to N-dimensions. It could perhaps take an N-dimensional arrays as input, but it will always return a row vector.  I have left it as it is for now at least.

$disp$ and $display$
These are functions that might be subject to change later on. At the moment it prints N-dimensional arrays of intervals in the same way Octave does for normal arrays. It's however not clear how to handle the $\subset$ symbol and we might decide to change it.

Interval solver or optimizer (5)

The functions $gauss$ and $polyval$ are not generalizable to N-dimensional vectors. I don't think that $fzero$ can be generalized either, for it to work the functions must be real-valued.

The function $fsolve$ can perhaps be modified to support N-dimensional vectors. It uses the SIVIA algorithm [3] and I have to dive deeper into how it works to see if it can be done.

For $fminsearch$ nothing needed to be done, it worked for N-dimensional arrays directly.

Interval contractor arithmetic (2)

Both of these functions are used together with $fsolve$ so they also depend on if SIVIA can be generalized or not.

Verified solver or optimizer (6)

All of these functions work on matrices and cannot be generalized.

Utility function (29)

All of these for which it made sense have been modified to support N-dimensional arrays. Some of them only works for matrices, these are $ctranspose$, $diag$, $transpose$, $tril$ and $triu$. I have left them as they were, though I fixed a bug in $diag$.

API function to the MPFR and crlibm libraries (8)

These are the functions that in general required most work. The ones I have added full support for N-dimensional arrays in are $crlibm\_function$, $mpfr\_function\_d$ and $mpfr\_vector\_sum\_d$. Some of them cannot be generalized, these are $mpfr\_matrix\_mul_d$, $mpfr\_matrix\_sqr\_d$ and $mpfr\_to\_string\_d$. The functions $mpfr\_linspace\_d$ and $mpfr\_vector\_dot\_d$ are related to what I mentioned above for $linspace$ and $dot$.

Summary

So summing up the functions that still require some work are
  • Functions related to $fsolve$
  • The functions $dot$ and $sumsq$
  • The functions $linspace$ and $mince$
Especially the functions related to $fsolve$ might take some time to handle. My goal is to dive deeper into this next week.

Apart from this there are also some more things that needs to be considered. The documentation for the package will need to be updated. This includes adding some examples which make use of the new functionality.

The interval package also did not follow the coding style for Octave. All the functions which I have made changes to have been updated with the correct coding style, but many of the functions that worked out of the box still use the old style. It might be that we want to unify the coding standard for all files before the next release.

[1] The $INDEX$ file https://sourceforge.net/u/urathai/octave/ci/default/tree/INDEX
[2] Bug #51333 https://savannah.gnu.org/bugs/index.php?51333
[3] The SIVIA algorithm https://www.youtube.com/watch?v=kxYh2cXHdNQ

Thursday, 22 June 2017

Vectorization and broadcasting

At the moment I'm actually ahead of my schedule and this week I started to work on support for vectorization on N-dimensional arrays. The by far biggest challenge was to implement proper broadcasting and most of this post will be devoted to going through that. At the end I also mention some of the other things I have done during the week.

Broadcasting arrays

At the moment I have implement support for broadcasting on all binary functions. Since all binary functions behave similarly in respect to broadcasting I will use $+$ in all my example below, but this could in principle be any binary function working on intervals.

When adding to arrays, $A, B$, of the same size the result is just an arrays of the same size with each entry containing the sum of the corresponding entries in $A$ and $B$. If $A$ and $B$ does not have the same size then we try to perform broadcasting. The simplest form of broadcasting is when $A$ is an arrays and $B$ is a scalar. Then we just take the value of $B$ and add to every element in $A$. For example

> A = infsupdec ([1, 2; 3, 4])
A = 2×2 interval matrix
   [1]_com   [2]_com
   [3]_com   [4]_com
> B = infsupdec (5)
B = [5]_com
> A + B
ans = 2×2 interval matrix
   [6]_com   [7]_com
   [8]_com   [9]_com

However it is not only when one of the inputs is a scalar that broadcasting can be performed. Broadcasting is performed separately for each dimension of the input. We require either that the dimensions are equal, and no broadcasting is performed, or that one of the inputs have that dimension equal to $1$, we then concatenate this input along that dimension until they are of equal size. If for example $A$ has dimension $4\times4\times4$ and $B$ dimension $4\times4\times1$ we concatenate $B$ with itself along the third dimension four times to get two arrays of the same size. Since a scalar has all dimensions equal to 1 we see that it can be broadcasted to any size. Both $A$ and $B$ can also be broadcasted at the same time, along different dimensions, for example

> A = infsupdec (ones (1, 5, 2))
A = 1×5×2 interval array
ans(:,:,1) =
   [1]_com   [1]_com   [1]_com   [1]_com   [1]_com
ans(:,:,2) =
   [1]_com   [1]_com   [1]_com   [1]_com   [1]_com
> B = infsupdec ([1, 2, 3, 4, 5; 6, 7, 8, 9, 10])
B = 2×5 interval matrix
   [1]_com   [2]_com   [3]_com   [4]_com    [5]_com
   [6]_com   [7]_com   [8]_com   [9]_com   [10]_com
> A + B
ans = 2×5×2 interval array
ans(:,:,1) =
   [2]_com   [3]_com   [4]_com    [5]_com    [6]_com
   [7]_com   [8]_com   [9]_com   [10]_com   [11]_com
ans(:,:,2) =
   [2]_com   [3]_com   [4]_com    [5]_com    [6]_com
   [7]_com   [8]_com   [9]_com   [10]_com   [11]_com

The implementation

I'll go through a little bit about my implementation. I warn you that I'm not that familiar with the internals of Octave so some things I say might be wrong, or at least not totally correct.

Internally all, numerical, arrays are stored as a linear vector and the dimensions are only metadata. This means that the most efficient way to walk through an array is with a linearly increasing index. When $A$ and $B$ have the same size the most efficient way to sum them is to linearly go through the arrays. In pseudo code

// Calculate C = A + B
for (int i = 0; i < numel (A); i++) {
  C(i) = A(i) + B(i);
}

This works fine, and apart from unrolling the loop or doing optimizations like that it is probably the most efficient way to do it.

If $A$ and $B$ are not of the same size then one way to do it would be to simply extend $A$ or/and $B$ along the needed dimensions. This would however require coping a lot of data, something we want to avoid (memory access is expensive). Instead we try to be smart with our indexing to access the right data from both $A$ and $B$.

After asking on the IRC-channel I got pointed to this Octave function which performs broadcasting. My implementation, which can be found here, is heavily inspired by that function.

Performance

Here I compare the performance of the new implementation with the old one. Since the old one could only handle matrices we are limited by that. We can measure the time it takes to add two matrices $A$, $B$ with the code

tic; A + B; toc;

We do 10 runs for each test and all times are in seconds.

Addition of large matrices

Case 1: A = B = infsupdec (ones (1000, 1000));
       Old         New
       0.324722    0.277179
       0.320914    0.276116
       0.322018    0.276075
       0.318713    0.279258
       0.332041    0.279593
       0.318429    0.279987
       0.323752    0.279089
       0.317823    0.276036
       0.320509    0.280964
       0.320610    0.281123
Mean:  0.32195     0.27854
Case 2: A = B = infsupdec (ones (10, 100000));
        Old         New
        0.299321    0.272691
        0.297020    0.282591
        0.296460    0.274298
        0.294541    0.279661
        0.298306    0.277274
        0.301532    0.275531
        0.298163    0.278576
        0.298954    0.279868
        0.302849    0.275991
        0.297765    0.278806
Mean:   0.29849    0.27753

Case 3: A = B = infsupdec (ones (100000, 10));
        Old         New
        0.286433    0.279107
        0.289503    0.278251
        0.297562    0.279579
        0.292759    0.283311
        0.292983    0.281306
        0.290947    0.282310
        0.293025    0.286172
        0.294153    0.278886
        0.293457    0.278625
        0.296661    0.280804
Mean:   0.29275     0.28084

Broadcasting scalars

Case 4: A = infsupdec (ones (1000, 1000));
             B = infsupdec (1);
        Old         New
        0.298695    0.292419
        0.298158    0.292274
        0.305242    0.296036
        0.295867    0.291311
        0.296971    0.297255
        0.304297    0.292871
        0.298172    0.300329
        0.297251    0.291668
        0.299236    0.294128
        0.300457    0.298005
Mean;   0.29943     0.29463

Case 5: A = infsupdec (1);
             B = infsupdec (ones (1000, 1000));
         Old         New
        0.317276    0.291100
        0.316858    0.296519
        0.316617    0.292958
        0.316159    0.299662
        0.317939    0.301558
        0.322162    0.295338
        0.321277    0.293561
        0.314640    0.291500
        0.317211    0.295487
        0.317177    0.294376
Mean:   0.31773     0.29521

Broadcasting vectors

Case 6: A = infsupdec (ones (1000, 1000));
             B = infsupdec (ones (1000, 1));
        Old         New
        0.299546    0.284229
        0.301177    0.284458
        0.300725    0.276269
        0.299368    0.276957
        0.303953    0.278034
        0.300894    0.275058
        0.301776    0.276692
        0.302462    0.282946
        0.304010    0.275573
        0.301196    0.273109
Mean:   0.30151     0.27833

Case 7: A = infsupdec (ones (1000, 1000));
             B = infsupdec (ones (1, 1000));
         Old         New
        0.300554    0.295892
        0.301361    0.294287
        0.302575    0.299116
        0.304808    0.294184
        0.306700    0.291606
        0.301233    0.298059
        0.301591    0.292777
        0.302998    0.290288
        0.300452    0.291975
        0.305531    0.290178
Mean:   0.30278     0.29384

We see that in all cases the new version is faster or at least equally fast as the old version. In the old version the order of the input made a slight difference in performance (case 4 vs case 5). In the new version both inputs are treated in exactly the same way so we no longer see that difference.

Possible improvements

In theory the cases when we broadcast a scalar could be the fastest ones. If $B$ is a scalar we could, in pseudo code, do something similar to
// Calculate C = A + B with B scalar
for (int i = 0; i < numel (A); i++) {
  C(i) = A(i) + B;
}

This is however not implemented at the moment. Instead we use the ordinary routine to calculate the index for $B$ (since it is a scalar it will always evaluate to $1$). If we would like to optimize more for this case we could add a check for if $A$ or $B$ are scalars and then optimize for that. Of course this would also make the code more complicated, something to watch out for. At the moment I leave it like this but if we later want to optimize for that case it could be done.

Other work

Apart from the work to fix the broadcasting for binary functions there were very little to do for many of the functions. All binary functions that use this code, and all unary functions using an even simpler code, worked directly after fixing the oct-files. Some of them required small changes to the documentation but other than that the octave-scripts were fine. So mainly it has been a matter of actually going through all files and check that they actually did work.

Bug #51283

When going through all the functions I noticed a bug in the interval version of $\sin$,

 > sin (infsupdec (0))
ans = [0]_com
> sin (infsupdec ([0, 0]))
ans = 1×2 interval vector
   [0, -0]_com   [0, -0]_com

The second version here is wrong, $-0$ should never be allowed as a value for the supremum of an interval. I was able to track this down to how Octaves $\max$ function works, see bug #51283. As Oliver writes there the exact behaviour of the $\max$-function is not specified in IEEE Std 754-2008 so we cannot rely on that. To solve this I have added a line to manually set all $-0$ to $+0$ in the supremum of the interval.

 

Friday, 16 June 2017

Construction and Printing

This week I have started to work on methods for constructing and printing N-dimensional arrays of intervals. In my timeline I estimated that this work would take 2 weeks. However in this first week I have managed to complete most of the work. I will give some comments on how I have worked with the Mercurial repository, how the work went and different things I encountered along the path.

Working with Mercurial

This is essentially the first time I'm using Mercurial for revision control, though I have used git before. However I quickly learned how to use it for the basic tasks that I need, committing, comparing files and checking the history. As mentioned in a previous post you can find my repository here [1].

Coding style

When I started to work with the files I realized that they did not follow Octaves coding standard [2]. After a short discussion on the mailing list we decided that I will update the files I change to follow the standard coding style. Usually it is not a good idea to change coding style and add functionality in the same commit. However most of the changes to coding style are only white space changes so they can be ignored using the -w flag in Mercurial. Thus we decided that as long as the coding style changes are only such that it is ignored with -w I will do it in the same commit as the added functionality. If there are some coding style changes that's not only white space, the most common example is to long lines, I do a commit with only changes to the coding style first. So if you want to take a look at the functionality I have added you will probably want to use the -w flag. Note however that I have not updated the coding style for any files I have not changed otherwise.

Committing

Normally I do one commit for each file, though in many cases the bare intervals and the decorated intervals have almost identical functions and in that case I commit changes to them both at the same time. Of course it also happens that I have to go back and do more changes to a files, in that case I just do another commit.

The actual work

The work went much faster than I expected. The main reason for this is that Octave has very good support for indexing. For example expressions like

isnai(x.inf <= x.sup) = false;

works just as well for matrices as for N-dimensional arrays. In fact the constructor for bare intervals even worked for N-dimensional arrays from the beginning, there I only had to do slight modification to the documentation and add some tests!

Not all functions were that easy though. Some functions that have not been updated in a while clearly assumed the input was a matrix, for example in $hull$

sizes1 = cellfun ("size", l, 1);
sizes2 = cellfun ("size", l, 2);


In most cases I only needed to add more general indexing, often times even making the code clearer.

In some functions all I had to do was to remove the check on the input data so that it would accept N-dimensional arrays. This was true in for example $cat$ were all I had to do was to remove the check and do some minor modifications to the documentation.

I can conclude with saying that Octave has great support for working with N-dimensional arrays. Since internally the data for intervals are stored only as arrays I could make good use of it!

Noteworthy things

While most functions were straight forward to modify some required some thought. How should they even work for N-dimensional input?

Disp

When modifying the $disp$-function I chose to mimic how Octave handles displaying N-dimensional arrays. I noticed that this is different from how Matlab handles it. In Matlab we have

> x = zeros (2, 2, 2)

x(:,:,1) =

     0     0
     0     0


x(:,:,2) =

     0     0
     0     0


while in Octave it's

> x = zeros (2, 2, 2)
x =

ans(:,:,1) =

   0   0
   0   0

ans(:,:,2) =

   0   0
   0   0


I don't know the choice behind Octaves version. At least at first glance I think I prefer the way Matlab does it. But since I'm working in Octave I chose that style.

The next question was how to handle the subset symbol, $\subset$. The interval package uses $=$ or $\subset$ depending on if the string representation is exact or not. For example

> x = infsup (1/2048, 1 + 1/2048);
> format short; x
x ⊂ [0.00048828, 1.0005]
> format long; x
x = [0.00048828125, 1.00048828125]

How should this be handled for N-dimensional arrays? One way would be to switch all $=$ to $\subset$ is the representation is not exact. Another to use $\subset$ on all submatrices that does not have an exact string representation. The third way, and how it is implemented now, is to only change the first $=$ to $\subset$, the one after the variable name. Like this

> x(1,1,1:2) = infsup (1/2048, 1 + 1/2048)
x ⊂ 1×1×2 interval array

ans(:,:,1) =   [0.00048828, 1.0005]
ans(:,:,2) =   [0.00048828, 1.0005]


This might be a bit odd when you first look at it, on some places we use $=$ and on some $\subset$. Though I think it somehow makes sense, we are saying that $x$ is a subset of the $1\times1\times2$ interval array given by

ans(:,:,1) =   [0.00048828, 1.0005]
ans(:,:,2) =   [0.00048828, 1.0005]

which actually is true. Anyway I will leave like this for now and then we might decide to switch it up later.

linspace and mince

The standard implementation of $linspace$ only supports scalar or vector input. It could be generalized to N-dimensional arrays by for example returning a N+1-dimensional array were the last dimension corresponds to the linearly spaced elements. But since this has not been done in the standard implementation I will at least wait with adding for intervals.

The function $mince$ can be seen as a interval generalization of $linspace$. It  takes an interval and returns an array of intervals whose union cover it. This could similarly be expanded to N dimensions by creating the array along the N+1 dimension. But again we choose to at least wait with adding this.

meshgrid and ndgrid

The interval package already has an implementation of $meshgrid$. But since it previously did not support 3-dimensional arrays it had to fit 3-d data in a 2-d matrix. Now that it supports 3-d data it can output that instead.

Currently the interval package does not implement $ndgrid$. When I looked into it I realized that the standard implementation of $ndgrid$ actually works for interval arrays as well. I have not looked into the internals but in principle it should only need the $cat$ function, which is implemented for intervals. Further I noticed that the standard $meshgrid$ also works for intervals. However the interval implementation differs in that it converts all input to intervals, were as the standard implementation allows for non-uniform output. Using the interval implementation of $meshgrid$ we have

> [X Y] = meshgrid (infsup (1:3), 4:6)
X = 3×3 interval matrix

   [1]   [2]   [3]
   [1]   [2]   [3]
   [1]   [2]   [3]

Y = 3×3 interval matrix

   [4]   [4]   [4]
   [5]   [5]   [5]
   [6]   [6]   [6]

but if we fall back to the standard implementation (by removing the interval implementation) we get

> [X Y] = meshgrid (infsup (1:3), 4:6)
X = 3×3 interval matrix

   [1]   [2]   [3]
   [1]   [2]   [3]
   [1]   [2]   [3]

Y =

   4   4   4
   5   5   5
   6   6   6

Note that the last matrix is not an interval matrix.

So the question is, should we implement a version of $ndgrid$ that converts everything to intervals or should we remove the implementation of $meshgrid$? It's at least most likely not a good idea that the functions are different. I think that removing the implementation of $meshgrid$ makes most sense. First of all it's less code to maintain, which is always nice. Secondly you can manually convert all input to the function to intervals if you want uniform output. If you do not want uniform output then the standard implementation works were as the interval implementation does not, so the standard implementation is more general in a sense.

We have to choose what to do, but for now I leave it as it is.

Non-generalizable functions

From what I have found there is no way to create a 3-dimensional array in Octave in the same way you can create a 2-dimensional one with for example

M = [1, 2; 3, 4];

Instead higher dimensional arrays have to be created using other functions, for example $reshape$ or $zeros$, or by specifying the submatrices directly

M(:,:,1) = [1, 2; 3, 4];
M(:,:,2) = [5, 6; 7, 8];

This means that the functions $\_\_split\_interval\_literals\_\_$, which is used to split a string like $"[1, 2; 3, 4]"$ into its separate components, cannot really be generalized to N dimensions.


[1] https://sourceforge.net/u/urathai/octave/ci/default/tree/
[2] http://wiki.octave.org/Octave_style_guide

Wednesday, 31 May 2017

The first day and my repository

Yesterday I handed in the last exam for the semester(!) and today I started to work on the project full time. I begun by setting up a repository for my code. At the moment I have only cloned the interval package, I'm still to make my own changes to it.

I started to read the constructor for intervals to see how it works. While doing that I realized that there is much less to do than I thought. The constructor can actually handle creating intervals from numerical arrays of any dimension. The function displaying the interval can only handle up to two dimension but if you look at the internal state it actually has more dimensions. It is not perfect though, it does not work for decorated intervals and it cannot handle strings and cells as input. But still, it makes it much easier for me.

I will continue to work and hopefully I have actually committed something by the end of the week. Next week I will be away but after that I will code for the rest of the summer.

Sunday, 28 May 2017

Timeline

This post will be about specifying a timeline for my work this summer. As I mentioned in the introductory post the work will be about implementing support for higher dimensional arrays in the interval package. To begin with I have divided the work into 5 different parts:

1. Construction and Printing

This part will be about implementing functions for creating and printing arrays. It will mainly consist of modifying the standard constructor and all the different functions used for printing intervals so they can handle higher dimensional arrays.

2. Vectorized Functions

Here I will work on generalizing all functions supporting vectorization to also support arrays of higher dimensions.

3. Folding Functions

Here I will work on generalizing all functions implementing some sort of folding to support higher dimensions. By folding I mean taking a multidimensional array as input and returning an array of lower dimension. Examples of these functions are $sum$, $prod$ and $dot$.

4. Plotting

I'm not sure what support Octave has for plotting higher dimensional arrays, but if there are some functions which could also be useful for intervals I will try to implement them here.

5. Documentation

I'll write the documentation alongside the rest of the work. In the end I will try to add some usage examples and integrate it better with the standard documentation.

So these are the parts in which I have divided my work in and the timeline will be

  • Phase 1 (30/5 - 30/6)
    • Week 1: Setting up
    • Week 2-3: Construction and Printing
    • Week 4: Vectorized Functions
  • Phase 2 (3/7 - 28/7)
    • Week 5: Continue on Vectorized Functions 
    • Week 6-7: Folding Functions
    • Week 8: Plotting
  • Phase 3 (31/7 - 25/8)
    • Week 9: Continue on Plotting
    • Week 10-11: Documentation
    • Week 12: Finishing up!

My first week will be rather short since I have an exam due in the middle of the week. After that I will also be away for a week not working on the project, I have no counted that week in the timeline above.

Sunday, 21 May 2017

Introduction

Hi! I'm Joel Dahne and I will be using this blog to share the progress of my work for Google Summer of Code 2017 under GNU Octave where I will be working on improving the interval package.

About me

Currently I'm a master student in mathematics at Uppsala University, Sweden, I'm just about to complete my first year and have one more to go. I first encountered interval numerics with my bachelor thesis, during which I tried to generalize the method described in this paper to higher dimensions. The code from the project is available on GitHub, https://github.com/Urathai/generateZeros. Working on that project ignited my interest for interval numerics and I want to work to increase its availability to the common user.

The project

In short my project is about adding support for N-dimensional arrays in the interval package. At the moment the package only has support for up to 2-dimensional arrays. Most of the time this is enough but sometimes it is limiting. The goal is to have the syntax identical to the one for normal arrays. Switching from floats to intervals should be as easy as adding $infsup$ in front of the variable.

During my time preparing I have also noticed that some of the folding functions (e.g. $sum$, $prod$ and $dot$) handle some edge cases differently than the standard implementation. Hopefully this can be resolved at the same time.

This blog

During my project I will try to update this blog with my current progress. It's my first time writing a blog so we will see how it goes. The next step for me is to create more detailed plan of how I should structure my work during the summer, this will most likely be the subject of my next post.