2012-04-29

Some of the problems during LadderQ implementation

When meeting with Kurt earlier this week, I was asked to put together some text detailing the sort of problems I've been solving during implementation. This post may be rather long and quite frankly uninteresting so I won't blame anyone if they choose not to read it. Those who enter here, abandon all hope.

TL;DR: Worked on LadderQueue, encountered difficulties and eventually overcame them...

Firstly, as I've mentioned before, the pseudocode is incredibly high-level at times, and low-level at others. For example, it contains lines like

transfer Top to rung 1 of Ladder
which is in itself non-trivial. Luckily this same operation is detailed elsewhere in the code but the same still
applies for other examples.

An example of low-level stuff is the manual incrementation and decrementation of sizes every time something is added to or removed from an array. When working in a modern language,
this is something that's almost always handled by the container itself. Care must however be taken when removing these variables, as the NRung variable, which indicates the number of
rungs currently in the ladder, should still be used in addition to the list containing the rungs. This because while the number of rungs in use, as indicated by NRung, can vary throughout
the program, the number of rungs in the structure will not vary because of performance reasons. You could reason that the ArrayList or whatever structure containing the rungs will do
the right thing
and won't re-size downwards immediately when removing a rung, but this discounts the possible performance advantages that can be achieved because the unused rungs are still in memory. The extra objects for each rung also don't have to be created. I do still keep a counter for the number of elements currently in the ladder part of the ladderQ (the rungs), because iterating over a list of lists of lists and adding elements isn't the fastest of things.

A pretty big problem I also have with the implementation is the fact that variables are almost never initialised. When talking about code like

while (TS < RCur[x] && x <= NRung)
    x++;

one can guess what the initial value for x should be (in this case 1, as the rungs are numbered from 1 onwards). This gets a lot trickier when it comes to the many arrays that are used throughout the program. For example, each of the rungs numbered 2 and larger are initially sized at 50 buckets (the first rung's size is determined at runtime but can still grow wildly) but it's perfectly possible that at some point in the program 300 buckets are needed in some rung (when the initial dequeueing into rung 1 is a bad estimation of the distribution in the first epoch). I solve this problem by simply checking if the structure is large enough and to grow it if it isn't, but it's not really ideal.

The pseudo code even contains a few things that are blatantly wrong, like for example the previously mentioned fragment of code

while (TS < RCur[x] && x <= NRung)
    x++;
that will access an element that is out of range on the last iteration unless rewritten to
while (x <= NRung && TS < RCur[x])
    x++;

But by far the most problematic are some general omissions in the code. Take for example a scenario in which three events are inserted in the queue, all having the same timestamp: 15. These will initially be inserted into top. When the first element is polled (or peek'd for that matter), the top rung will be moved to the first rung. This first rung will have to be created. In order to create a rung from top, the bucket width (width in time of a bucket) is calculated based on

Bucketwidth[1] = (MaxTS -- MinTS) / NTop;
which will be (15-15)/3 which is +INF. Now, I'll be the first to admit a scenario like this won't give you a good epoch because the code has no idea about the distribution of the incoming events, but that doesn't change that fact that something like this should be taken into account. A second problem with this same scenario is that the events with equal timestamps will obviously be in the same bucket. In the same dequeue operation where they were moved from top to ladder, the bucket containing all events will also be moved to the sorted bottom list. If that bucket is the only bucket, NRung should be set to zero when moving that bucket, or problems will occur with the next poll() operation trying to poll from an empty rung.

As I already explained on Tuesday, the problem I'm currently having is that when dequeueing an element, RCur is not updated. This can cause serious problems when enqueueing after that and dequeueing again. I already explained the problem to Kurt and Silas so I won't go into much detail, but suffice it to say that it's a bit of a tricky problem.

A little something extra, which isn't exactly a problem in the code but more of a problem I had, was that the peek() operation was not specified. Rather than the

x=poll();
offer(x);
I initially had, I now use the faster
x = poll();
bottom.add(0, x);
or equivalent, which adds the event to the first position in bottom. I don't know how often the peek() operation is used in the simulator, but it should be well worth the minor adjustment
anyhow. Checking the bottom list is the first thing that's done during a dequeue operation and it's impossible to know what the first event is without going through most of the logic in poll()
so this implementation should be more or less optimal. The only thing that could be faster is that I actually remove the event from the first position in bottom, after which I add it again. If
this proves to be a frequently used operation I can always either duplicate the code or use a strange function and some conditional logic that's either marginally faster or isn't.

Geen opmerkingen:

Een reactie posten