Event Queueing ADT's
2012-06-18
Presentation
2012-05-27
A small logging intermezzo
| Nothing | No impact (obviously) |
| Calling an empty private final function with a static string in parameter | No impact |
| Calling an empty private final function with a dynamically composed string (eg "iteration" + i) | Large impact |
| Calling an empty private final function with a static string and integer (in separate params) | No impact |
| Calling logger.debug on an slf4j logger with attached but disabled logback logger with static string | 8,0 × 10-9 seconds per invocation |
| Calling logger.debug on an slf4j logger with attached but disabled logback logger with static string and int | 2,14 × 10-8 seconds per invocation |
| Calling logger.debug on an slf4j logger with attached but disabled logback logger with static string and String | 1,8 × 10-8 seconds per invocation |
The last items mean calling it like this: logger.debug("something {}", i); . In this table large impact means "I let it run for 40 minutes after which I gave up and spent the next 15 minutes trying to avoid my computer crashing because of thermal overload (lots of "don't you die on me" and mouth-to-fan resuscitation was involved).
The overhead caused by using slf4j statically with a static string isn't too bad, but when using the {} syntax, logging with an object in parameter seems to take about twice as long as logging only a simple string. Logging an integer seems to take even longer than that due to auto-boxing.
I also tried adding a if(logger.isDebugEnabled()) statement before the logger.debug() statement but this did not have any impact. So there's no need to do stuff like
if(logger.isDebugEnabled())
logger.debug("entering whatever");
as long as you're not doing anything funny in the logging statement. The difference is smaller than what I thought it to be earlier, but I'd still like to give the LadderQ the best possible chances. So I guess that I'll use a small interface and decide on compile time to either send debugging info to slf4j, or do nothing with it at all. This has no overhead and still allows me to keep the debug statements.
<Insert an obligatory clenbuterol zero zero zero zero joke about the "0,0000001038 seconds per invocation">
2012-05-26
Small progress report
In other news, the Ladder Queue seems to withstand my torture testing quite well (after some minor adjustments like the ability to deal with more than 5 rungs, which is very unlikely to come up in real simulation), so it seems that it's about ready to be plugged into the simulator.
2012-05-18
VM crashes and more...
To remedy this, I once again changed the script to send its stdout to file instead of PIPE, which reintroduced the "cutting off the output" bug I encoutered earlier. I tried a lot of different things to solve this problem, like changing the subprocess.call to subprocess.Popen, changing shell=True to shell=False, etc. None of this had any effect. I'm currently running again with a simple System.out.flush() after the print command in java code (I hope this flush is propagated from the java program to ant to maven to python). If that doesn't work, my backup solution is to send the timing output on stderr, which should be a pretty much empty stream and shouldn't have the same problems.
Ladder Queue work
Yesterday I also fixed the creation of a new rung from bottom. After this, my simulation-like test succeeded completely. Today I'm running something of a torture-test on the ladderQ, threating it like an ordinary queue (in something that's definitely not a discrete event simulation). I'm just inserting a large amount of completely randomly distributed [0-10000] events, removing a random amount of them, adding another large amount and repeating that for a few hundred times. The ladder queue will not perform well (in a time optimal manner) in a test like this, but it should be able to handle the test none the less.This exposed another bug in the pseudo code, and one I definitely did not see in advance. When inserting a new element that's smaller than the start of the lowest rung, it should be inserted into bottom. To prevent degradation to a linked list, a check is made to see if bottom isn't too full when inserting that element, and if it is, bottom is moved to a newly created rung. The code for this goes like this (verbatim):
create_new_rung(NBot)
transfer bottom to it
bucket_k = <formula>
insert into tail of rung NRung, bucket_k
<increase number of event in that bucket>
There's a number of problems with this code: first of all, create_new_rung creates a new rung from an existing rung, while there is no guarantee that at that time a single rung exists. Furthermore, it simply does not handle the creation of a new rung from bottom correctly! Rstart and Rcur for the new rung should obviously be adjusted to the values in bottom, but they are not. The instruction insert into tail of rung NRung, bucket_k is also very much incorrect, seeing as how there is no guarantee that that event will even fit in that rung, seeing as how the rung was made before adding the new event. I'll handle this by replacing the insert into tail of rung NRung, bucket_k with a recursive call to offer() of the new event.
2012-05-17
Better measurements
Fix for latest bug
The solution I'm using now is slightly cludgy, with each of the entities reporting back to the model when they're finished and the model doing something like this:
@Override
public synchronized void done() {
runningNodes--;
if(runningNodes == 0) {
ParallelEventSystem.mayFinish();
}
}
. This seems to work pretty well. If I wanted to I could even put the print of the time it took in this function, it would probably be more accurate. I'll try that next.
2012-05-07
SimulationBound & Benchmarking
I just tried it out, and my benchmark machine, when running with 10 nodes and the Java priority queue, in 1 minute about 824000 events can processed. So that comes to 8 240 000 events per minute or 137 333 events per second (running on 1 core).
A small problem when actually setting the number of events to process to 8 240 000, is that the simulation does not stop. The entities stop running but the simulation (process) keeps on running.
I guess I could use System.exit() or other dirtyness, but I'll ask Kurt at the next meeting what the preferred way of making the simulator stop is. One problem I can see with the simple System.exit() is that this exit would happen after a *single* Entity completes its work, but other Entities may still have work remaining.
2012-05-06
Creating a new rung because the first bucket of last rung
No major problems, other than the problems outlined in the previous post, and a small omission in the pseudocode (in recurse_rungs setting k = 0 after expanding a bucket to a rung).
Next up: fixing the creation of new rungs from an overful bottom.
Current work on the Ladder Queue
(You guys asked for that one...)
So, after fixing the RCur problem and finally getting a successful run of my test suite, I realised that at no point I actually had more than one rung in my ladderQ (it's a good thing multiple rungs are so hard to create, it means the ladderQ is reasonably efficient). So I forced the creation of a new rung by adding a lot of events that are really close together in time. This revealed (as expected) a number of other problems. For example, there are two ways of creating a new rung:
- The first bucket of the lowest rung is getting too full
- The bottom structure is getting too full
create_new_rung simply does something like this:
BucketWidth[NRung] = BucketWidth[NRung-1]
RSTart[NRung] = RCur[NRung] = RCur[NRung - 1]
which is a strange thing to do, since that way there's really no way for the bucketwidth of two rungs to differ. And that's a pretty essential feature (they even use an array to indicate this value differs for different rungs, so once again I don't know what they were thinking). It should be a reasonably easy problem to solve.
2012-05-01
First SleepModel results
EncapsulatedSkipList;140000000;27289561472
EncapsulatedSkipList;140000000;27263209214
EncapsulatedSkipList;140000000;27333339798
EncapsulatedSkipList;140000000;29876444971
EncapsulatedSkipList;140000000;33150127870
FibonacciHeapQueue;140000000;25657102324
FibonacciHeapQueue;140000000;25408097658
FibonacciHeapQueue;140000000;25290353285
FibonacciHeapQueue;140000000;26417257884
FibonacciHeapQueue;140000000;24674895865
EventHeapQueue;140000000;22657880920
EventHeapQueue;140000000;24321408297
EventHeapQueue;140000000;23273707683
EventHeapQueue;140000000;24240075311
EventHeapQueue;140000000;23326448113
CalendarQueue;140000000;26320928253
CalendarQueue;140000000;25328098246
CalendarQueue;140000000;25755167657
CalendarQueue;140000000;25315079319
CalendarQueue;140000000;26312306105
PairingHeapQueue;140000000;32619338904
PairingHeapQueue;140000000;31721119742
PairingHeapQueue;140000000;32086773030
PairingHeapQueue;140000000;30888228364
PairingHeapQueue;140000000;31549981466
SkipList;140000000;26642718242
SkipList;140000000;29466671062
SkipList;140000000;29355682538
SkipList;140000000;27473396963
SkipList;140000000;29299904675
JavaPriorityQueue;140000000;24293036390
JavaPriorityQueue;140000000;24281461792
JavaPriorityQueue;140000000;24290784409
JavaPriorityQueue;140000000;24262189242
JavaPriorityQueue;140000000;24499839511
SplayTreeQueue;140000000;26288893989
SplayTreeQueue;140000000;26309865599
SplayTreeQueue;140000000;26307518832
SplayTreeQueue;140000000;26505388409
SplayTreeQueue;140000000;27006209760
TreeMapQueue;140000000;23357232492
TreeMapQueue;140000000;23308374541
TreeMapQueue;140000000;24277818530
TreeMapQueue;140000000;24300141711
TreeMapQueue;140000000;24278979156
HashMapQueue;140000000;25264645807
HashMapQueue;140000000;25271931607
HashMapQueue;140000000;25299499125
HashMapQueue;140000000;25314163271
HashMapQueue;140000000;25292758506
which corresponds to
| EventHeapQueue | 96.0300322964% |
| TreeMapQueue | 99.9466222301% |
| JavaPriorityQueue | 100.0% |
| HashMapQueue | 104.124914536% |
| FibonacciHeapQueue | 104.599741327% |
| CalendarQueue | 106.028554794% |
| SplayTreeQueue | 108.312128402% |
| EncapsulatedSkipList | 112.525554292% |
| SkipList | 120.621484188% |
| PairingHeapQueue | 130.589112348% |
The EventHeapQueue and TreeMapQueue do a good job in both this synthetic benchmark and the actual simulation. HashMapQueue and the JavaPriorityQueue is about the same in both (reasonably good).
The CalendarQueue, which did quite good in the the practical benchmark, was a lot worse in the synthetic one. The PairingHeapQueue apparently just sucks all the time.
2012-04-29
Some of the problems during LadderQ implementation
TL;DR: Worked on LadderQueue, encountered difficulties and eventually overcame them...
Firstly, as I've mentioned before, the pseudocode is incredibly high-level at times, and low-level at others. For example, it contains lines like
transfer Top to rung 1 of Ladderwhich is in itself non-trivial. Luckily this same operation is detailed elsewhere in the code but the same still
applies for other examples.
An example of low-level stuff is the manual incrementation and decrementation of sizes every time something is added to or removed from an array. When working in a modern language,
this is something that's almost always handled by the container itself. Care must however be taken when removing these variables, as the NRung variable, which indicates the number of
rungs currently in the ladder, should still be used in addition to the list containing the rungs. This because while the number of rungs in use, as indicated by NRung, can vary throughout
the program, the number of rungs in the structure will not vary because of performance reasons. You could reason that the ArrayList or whatever structure containing the rungs will do
the right thing and won't re-size downwards immediately when removing a rung, but this discounts the possible performance advantages that can be achieved because the unused rungs are still in memory. The extra objects for each rung also don't have to be created. I do still keep a counter for the number of elements currently in the ladder part of the ladderQ (the rungs), because iterating over a list of lists of lists and adding elements isn't the fastest of things.
A pretty big problem I also have with the implementation is the fact that variables are almost never initialised. When talking about code like
while (TS < RCur[x] && x <= NRung)
x++;
one can guess what the initial value for x should be (in this case 1, as the rungs are numbered from 1 onwards). This gets a lot trickier when it comes to
the many arrays that are used throughout the program. For example, each of the rungs numbered 2 and larger are initially sized at 50 buckets (the first rung's size is
determined at runtime but can still grow wildly) but it's perfectly possible that at some point in the program 300 buckets are needed in some rung
(when the initial dequeueing into rung 1 is a bad estimation of the distribution in the first epoch). I solve this problem by simply checking if the structure is
large enough and to grow it if it isn't, but it's not really ideal.
The pseudo code even contains a few things that are blatantly wrong, like for example the previously mentioned fragment of code
while (TS < RCur[x] && x <= NRung)
x++;
that will access an element that is out of range on the last iteration unless rewritten to
while (x <= NRung && TS < RCur[x])
x++;
But by far the most problematic are some general omissions in the code. Take for example a scenario in which three events are inserted in the queue, all having the same timestamp: 15. These will initially be inserted into top. When the first element is polled (or peek'd for that matter), the top rung will be moved to the first rung. This first rung will have to be created. In order to create a rung from top, the bucket width (width in time of a bucket) is calculated based on
Bucketwidth[1] = (MaxTS -- MinTS) / NTop;which will be (15-15)/3 which is +INF. Now, I'll be the first to admit a scenario like this won't give you a good epoch because the code has no idea about the distribution of the incoming events, but that doesn't change that fact that something like this should be taken into account. A second problem with this same scenario is that the events with equal timestamps will obviously be in the same bucket. In the same dequeue operation where they were moved from top to ladder, the bucket containing all events will also be moved to the sorted bottom list. If that bucket is the only bucket, NRung should be set to zero when moving that bucket, or problems will occur with the next poll() operation trying to poll from an empty rung.
As I already explained on Tuesday, the problem I'm currently having is that when dequeueing an element, RCur is not updated. This can cause serious problems when enqueueing after that and dequeueing again. I already explained the problem to Kurt and Silas so I won't go into much detail, but suffice it to say that it's a bit of a tricky problem.
A little something extra, which isn't exactly a problem in the code but more of a problem I had, was that the peek() operation was not specified. Rather than the
x=poll(); offer(x);I initially had, I now use the faster
x = poll(); bottom.add(0, x);or equivalent, which adds the event to the first position in bottom. I don't know how often the peek() operation is used in the simulator, but it should be well worth the minor adjustment
anyhow. Checking the bottom list is the first thing that's done during a dequeue operation and it's impossible to know what the first event is without going through most of the logic in poll()
so this implementation should be more or less optimal. The only thing that could be faster is that I actually remove the event from the first position in bottom, after which I add it again. If
this proves to be a frequently used operation I can always either duplicate the code or use a strange function and some conditional logic that's either marginally faster or isn't.
2012-04-08
Latest results
The raw results I gathered:
TreeMapQueue;250000;44504685563
TreeMapQueue;250000;43510640683
TreeMapQueue;250000;43507232275
TreeMapQueue;250000;43493418310
TreeMapQueue;250000;45450831695
PairingHeapQueue;250000;126060167710
PairingHeapQueue;250000;125240863979
PairingHeapQueue;250000;145046297489
PairingHeapQueue;250000;137354103844
PairingHeapQueue;250000;144051341014
HashMapQueue;250000;52394058529
HashMapQueue;250000;51806601251
HashMapQueue;250000;51151331841
HashMapQueue;250000;53559766910
HashMapQueue;250000;52176779442
FibonacciHeapQueue;250000;54482560996
FibonacciHeapQueue;250000;54985676302
FibonacciHeapQueue;250000;54372282025
FibonacciHeapQueue;250000;53014851349
FibonacciHeapQueue;250000;52796625042
JavaPriorityQueue;250000;51551165898
JavaPriorityQueue;250000;51443365330
JavaPriorityQueue;250000;50457917413
JavaPriorityQueue;250000;52876494280
JavaPriorityQueue;250000;53129498732
SkipList;250000;58607951279
SkipList;250000;60014792829
SkipList;250000;58422165150
SkipList;250000;57934313472
SkipList;250000;60079309281
CalendarQueue;250000;42451213404
CalendarQueue;250000;44537189695
CalendarQueue;250000;42570999140
CalendarQueue;250000;40472274434
CalendarQueue;250000;42437191377
EventHeapQueue;250000;40552683898
EventHeapQueue;250000;39938462029
EventHeapQueue;250000;42589802688
EventHeapQueue;250000;42522441333
EventHeapQueue;250000;41810384506
EncapsulatedSkipList;250000;60108290120
EncapsulatedSkipList;250000;59988195877
EncapsulatedSkipList;250000;59373120898
EncapsulatedSkipList;250000;58392384034
EncapsulatedSkipList;250000;60109929789
SplayTreeQueue;250000;58798470148
SplayTreeQueue;250000;61035334794
SplayTreeQueue;250000;60905879898
SplayTreeQueue;250000;61560373083
SplayTreeQueue;250000;60811032674
which results in the following results (with the second column being percentage of the time it takes to execute that same experiment compared to the java priority queue):
| EventHeapQueue | 81.1046341585% |
| CalendarQueue | 82.3477270873% |
| TreeMapQueue | 84.4028256686% |
| JavaPriorityQueue | 100.0% |
| HashMapQueue | 101.213577876% |
| FibonacciHeapQueue | 105.472458436% |
| SkipList | 113.688895795% |
| EncapsulatedSkipList | 116.366322336% |
| SplayTreeQueue | 118.146464463% |
| PairingHeapQueue | 266.442283994% |
2012-04-01
Mounting with samba and new results
mount -t cifs -o username=ives,password=<passwd>,uid=ives,gid=ives //ix/users/Ives/stage /stage
This obviously doesn't work if I'm not in the same network (I don't feel like setting up a VPN).
Now, for the worse news: I ran the benchmark again earlier this week, with the following raw results:
JavaPriorityQueue;10000;6561024477
JavaPriorityQueue;10000;5404055641
JavaPriorityQueue;10000;5448148997
JavaPriorityQueue;10000;5518502627
JavaPriorityQueue;10000;5400359276
JavaPriorityQueue;10000;5430126603
JavaPriorityQueue;10000;5759644066
JavaPriorityQueue;10000;5401271246
JavaPriorityQueue;10000;5464453238
JavaPriorityQueue;10000;5407788605
PairingHeapQueue;10000;5446775232
PairingHeapQueue;10000;5423628469
PairingHeapQueue;10000;5418897748
PairingHeapQueue;10000;5421841291
PairingHeapQueue;10000;5409722011
PairingHeapQueue;10000;5729907979
PairingHeapQueue;10000;5416062530
PairingHeapQueue;10000;5477439257
PairingHeapQueue;10000;5433077898
PairingHeapQueue;10000;5412777948
TreeMapQueue;10000;5425671545
TreeMapQueue;10000;5426851101
TreeMapQueue;10000;5425298384
TreeMapQueue;10000;5399660301
TreeMapQueue;10000;5392418944
TreeMapQueue;10000;5391327447
TreeMapQueue;10000;5926422059
TreeMapQueue;10000;5393873311
TreeMapQueue;10000;5443363135
TreeMapQueue;10000;5744871062
SkipList;10000;5397133306
SkipList;10000;5675809273
SkipList;10000;5422119755
SkipList;10000;5392720363
SkipList;10000;5400829320
SkipList;10000;5422794354
SkipList;10000;5415324309
SkipList;10000;7051620337
SkipList;10000;5416224427
SkipList;10000;5420276478
SplayTreeQueue;10000;5395243251
SplayTreeQueue;10000;5412487676
SplayTreeQueue;10000;5545716916
SplayTreeQueue;10000;5442838378
SplayTreeQueue;10000;5430914414
SplayTreeQueue;10000;5610695557
SplayTreeQueue;10000;5464748603
SplayTreeQueue;10000;5408613093
SplayTreeQueue;10000;5501706374
SplayTreeQueue;10000;5405540312
HashMapQueue;10000;5396951336
HashMapQueue;10000;5384225368
HashMapQueue;10000;5393222775
HashMapQueue;10000;5398562791
HashMapQueue;10000;5368138462
HashMapQueue;10000;5419371366
HashMapQueue;10000;5769373695
HashMapQueue;10000;5489567527
HashMapQueue;10000;5438062242
HashMapQueue;10000;5873245080
EncapsulatedSkipList;10000;5621653702
EncapsulatedSkipList;10000;5439568704
EncapsulatedSkipList;10000;5412919542
EncapsulatedSkipList;10000;5461816898
EncapsulatedSkipList;10000;5462168806
EncapsulatedSkipList;10000;5402203002
EncapsulatedSkipList;10000;5425761816
EncapsulatedSkipList;10000;5393928593
EncapsulatedSkipList;10000;5490411424
EncapsulatedSkipList;10000;5548951710
FibonacciHeapQueue;10000;5666839845
FibonacciHeapQueue;10000;6225841498
FibonacciHeapQueue;10000;5429501483
FibonacciHeapQueue;10000;5423790030
FibonacciHeapQueue;10000;5815081783
FibonacciHeapQueue;10000;5437626501
FibonacciHeapQueue;10000;5462466700
FibonacciHeapQueue;10000;5400344311
FibonacciHeapQueue;10000;5480721465
FibonacciHeapQueue;10000;5426114305
CalendarQueue;10000;5380552101
CalendarQueue;10000;5488739278
CalendarQueue;10000;5993241530
CalendarQueue;10000;5388830581
CalendarQueue;10000;5406325242
CalendarQueue;10000;5446190891
CalendarQueue;10000;5401955885
CalendarQueue;10000;5407636115
CalendarQueue;10000;5406300099
CalendarQueue;10000;5422167002
EventHeapQueue;10000;5700755620
EventHeapQueue;10000;5400055933
EventHeapQueue;10000;4371602178
EventHeapQueue;10000;4549412219
EventHeapQueue;10000;6803269647
EventHeapQueue;10000;5401834240
EventHeapQueue;10000;5759731649
EventHeapQueue;10000;5439161377
EventHeapQueue;10000;5446489216
EventHeapQueue;10000;4398517379
.
This translates to the following relative results:
| CalendarQueue | 99.2563918127% |
| HashMapQueue | 99.4717906758% |
| SkipList | 99.4884038778% |
| PairingHeapQueue | 99.5499291959% |
| TreeMapQueue | 99.5874295653% |
| EventHeapQueue | 99.8350335131% |
| SplayTreeQueue | 99.9025243435% |
| JavaPriorityQueue | 100.0% |
| EncapsulatedSkipList | 100.250872379% |
| FibonacciHeapQueue | 100.262799402% |
The next step will be to see what's taking up time in a profiler and increasing the time limit a single experiment can take.
2012-03-24
Small status update
This setup, with the directory on my windows machine but accessible from xubuntu, is working well for me (works with my backup strategy, gives me svn access from both windows and linux and doesn't require any synchronization).
Next up is synchronizing with my benchmark machine (I guess I could try something with a network share but that's probably more trouble than it's worth), compiling and then running the benchmark again.
2012-03-23
This week's progress
- My windows machine has an svn repo but can't compile, nor benchmark
- My xubuntu machine can compile but doesn't have an svn repo (it seems to have been lost in transit) and also can't be used for benchmarking because it's a virtual machine
- My fedora machine can compile and benchmark, but is always controlled remotely which makes it inappropriate for development (although I'm normally a vim guy, I must admit I prefer eclipse for java development). Due to ridiculously unreliable WoL I can only access this machine during the weekend.
So you can imagine the juggling I do... I'll probably fix the Xubuntu machine for svn tomorrow, and might even spend some more time on getting WoL working on the fedora machine (although I'm reluctant to spend another afternoon on that problem).
2012-03-10
This week's progress
Furthermore, I also continued working on the LadderQ.
2012-03-02
This week's progress
So, since the last update (which was only four days ago mind you), I installed a 64 bit recent version of Xubuntu in a VM. I also installed the coroutine JDK and eclipse. By way of proving the coroutine JDK works I ran eclipse on it and it seemed to work fine. Minutes ago, I installed maven and copied over my entire GES tree.
Now, for the bad news...
When compiling the core project with the jdk7 pom.xml and
ives@vbox64:~/stage/branch/students/ives/core$ echo $JAVA_HOME
/home/ives/Programmas/corovm-jdk
ives@vbox64:~/stage/branch/students/ives/core$ which javac
/home/ives/Programmas/corovm-jdk/bin/javac
ives@vbox64:~/stage/branch/students/ives/core$ which java
/home/ives/Programmas/corovm-jdk/bin/java
I still get the same error I got before (on Windows):
[Invoker 160863352] Invoking be.ac.ua.comp.gestest.systemtest.de.multicore.SimpleTest.testSingleCoreDelayedEntityCalls
2012-03-02 23:24:15,914 [Core0] ERROR org.apache.commons.javaflow.bytecode.StackRecorder - Process.SuspendNow(..) fall through.
be.ac.ua.comp.core.de.EventSystemError: Process.SuspendNow(..) fall through.
at be.ac.ua.comp.core.de.AbstractProcess.suspendNow(AbstractProcess.java:157)
at be.ac.ua.comp.core.de.AbstractProcess.delay(AbstractProcess.java:317)
at be.ac.ua.comp.core.de.AbstractProcess.delay(AbstractProcess.java:282)
at be.ac.ua.comp.core.de.AbstractProcess.delayCurrent(AbstractProcess.java:375)
at be.ac.ua.comp.gestest.systemtest.de.multicore.SimpleTest$2.run(SimpleTest.java:389)
at be.ac.ua.comp.core.de.AbstractSchedulable.internalRun(AbstractSchedulable.java:87)
at be.ac.ua.comp.core.de.AbstractProcess$1.run(AbstractProcess.java:59)
at org.apache.commons.javaflow.bytecode.StackRecorder.execute(StackRecorder.java:92)
at org.apache.commons.javaflow.Continuation.continueWith(Continuation.java:152)
at org.apache.commons.javaflow.Continuation.continueWith(Continuation.java:125)
at be.ac.ua.comp.core.de.continuation.JFContinuation.start(JFContinuation.java:28)
at be.ac.ua.comp.core.de.AbstractProcess.start(AbstractProcess.java:56)
at be.ac.ua.comp.core.de.multicore.ParallelEventCore.run(ParallelEventCore.java:602)
at java.lang.Thread.run(Thread.java:734)
For some reason it seems like that test is still trying to use javaflow (and failing). I haven't mailed Kurt or Silas with this problem yet and I'll probably do so tomorrow. First I'm going to try to get some rest, maybe even some sleep.
2012-02-27
Intermediary Progress
Since I don't have access to the Fedora machine 4 days a week (I tried WoL stuff, but the combination closed laptop+fedora+WoL didn't work very well), I'll be continuing my development efforts on a virtual machine on my Windows host.
Unfortunately the only VM I still had lying around did not have enough free space for the DES and the coroutine JDK. After making some room and installing maven, I still got pretty weird errors. My coin dropped (only Flemish people will get that one) quickly and I realized that that VM was a 32bit VM, while the prebuilt coroutine JDK was 64bit software. A 64bit fedora image I tried was way too slow to be practical.
I'll be downloading a 64bit linux image tomorrow on campus to spare my home internet connection. By sheer luck the benchmark machine does run a 64-bit version so that's nice.
2012-02-24
This week's progress
Some of the problems I had during implementation:
- Pseudo pseudo code. By this I mean that the code near the end of the paper wasn't lowlevel enough to call it pseudocode. E.g, 4 lines out of the pseudocode algorithm:
transfer Top to rung 1 of Ladder bucket k = recurse rung(); sort events from bucket k and copy to Bottom return first event from Bottom - Missing initialisations. Initialising variables is apparently optional.
- Missing code (show me where MaxTS and MinTS are adjusted)
- Missing considerations. E.g.
Note that if MaxTs is equal to MinTs, it means all the events in Top have the same timestamp. In this article we consider the mean jump to be finite and positive. Thus, the likelihood of this occurring is extremely low.
After which they ignore the issue completely. Apparently nobody seems to consider the possibility that there might only be one event in the Event Queue at the moment the first event is pulled out (this event's timestamp == MaxTs == MinTS, because of which BucketWidth == 0, which causes a division by zero and crash). This might not occur in normal situations but if you're gonna write an ADT, it better be correct all of the time.
I suppose it could be worse.
On an unrelated note, I'll be changing to the new openJDK7+coroutines way of life from the current sun JDK6+javaflow either tomorrow or today on both my development and my benchmark machines. We're hoping this will influence the results in a positive way.2012-02-17
This week's progress
I've started implementation of the ladderQ, including a few tests to ensure the ladderQ is correct before plugging it into the event simulator. Implementation is rather elaborate and although the pseudocode near the end of the paper definitely helps, it's still incredibly far from an implementation in an existing language.
2012-02-07
SplayTreeQueue
E e = findMin(); remove(e); return e;to the contents of those functions. I still have no real idea what causes these errors...
I also optimised the removeMin operation a little (took advantage of the fact that findMin already splayed the tree on the the leftmost element, which means the remove() does not need to repeat this operation).
But unfortunately, no sigar. The SplayTreeQueue seems to be one of the slowest (definitely not faster than the JavaPriorityQueue) priority queues tested yet.
2012-02-05
More progress and first data
I also ran the benchmark on MARLIN (the dedicated pc) and got the following raw results (the second column is max simulation time and the third is wall clock time in nanoseconds):
CalendarQueue;1000;12463543564 CalendarQueue;1000;12431540564 CalendarQueue;1000;12314603248 CalendarQueue;1000;12342168941 CalendarQueue;1000;12379956154 EncapsulatedSkipList;1000;12233962662 EncapsulatedSkipList;1000;12435363549 EncapsulatedSkipList;1000;13524316579 EncapsulatedSkipList;1000;12451756549 EncapsulatedSkipList;1000;13430291485 SkipList;1000;12442345820 SkipList;1000;13471379407 SkipList;1000;13259910859 SkipList;1000;13423150185 SkipList;1000;13433629771 JavaPriorityQueue;1000;11427608446 JavaPriorityQueue;1000;11331713344 JavaPriorityQueue;1000;11486361044 JavaPriorityQueue;1000;11447420677 JavaPriorityQueue;1000;11489319111 PairingHeapQueue;1000;12627835823 PairingHeapQueue;1000;12461527869 PairingHeapQueue;1000;12635402319 PairingHeapQueue;1000;12766900475 PairingHeapQueue;1000;12403462163 TreeMapQueue;1000;11477720945 TreeMapQueue;1000;11412306791 TreeMapQueue;1000;11567066375 TreeMapQueue;1000;11426513260 TreeMapQueue;1000;12277413812 FibonacciHeapQueue;1000;12358072838 FibonacciHeapQueue;1000;11414073432 FibonacciHeapQueue;1000;11296198337 FibonacciHeapQueue;1000;12390780696 FibonacciHeapQueue;1000;12356444144 HashMapQueue;1000;12510711908 HashMapQueue;1000;12231090920 HashMapQueue;1000;12543661825 HashMapQueue;1000;12463697651 HashMapQueue;1000;12440836556 EventHeapQueue;1000;11464068161 EventHeapQueue;1000;11328706434 EventHeapQueue;1000;11411157250 EventHeapQueue;1000;11466460866 EventHeapQueue;1000;11439271687
Selecting the median of each queue and setting the Java Priority Queue to 100% gives (lower is better)
| EventHeapQueue | 99.9288137456% |
| JavaPriorityQueue | 100.0% |
| TreeMapQueue | 100.264690788% |
| FibonacciHeapQueue | 107.940858405% |
| CalendarQueue | 108.146249739% |
| EncapsulatedSkipList | 108.773468717% |
| HashMapQueue | 108.877781316% |
| PairingHeapQueue | 110.311625468% |
| SkipList | 117.259167491% |
So, the score for today:
Oracle: 1
Ives: 0
2012-02-02
Progress...
I also changed the simulation script that tries out different queueing strategies by passing different XML files to the simulator. It currently outputs simple CSV data, containing the nanosecond timestamps gathered from standard output of the simulator.
I also implemented a CalendarQueue (from here on called CQ), based on the one used by Ptolemy II, in the simulator. Porting it over was a bit harder than expected, since the Ptolemy II CQ wasn't generic ("templated" on the event), used all sorts of internal Ptolemy II classes like Time and most of all, it uses a "smart" way of structuring the CQ. This structuring takes into account when the contained events fire to structure the CQ.
2012-01-28
Returning...
I've suspected for a while that a big part of the reason I was getting a lot of variance in my benchmark results was that I'm running them on my i7 laptop with Intel Turbo Boost. I downloaded a small intel gadget that shows your current processor speed (which isn't the one reported through eg Process Explorer) and it showed my clock speed changing at least once a second. Although it should certainly be possible to benchmark on this machine, I'm afraid I'd either have to reboot after every test, run really long tests, run in a temperature controlled environment, etc.
I decided it might be a good idea to try benchmarking on another machine, so I spent most of yesterday installing Fedora 16 on a Core2Duo laptop (I won't be working directly from this machine but controlling it through SSH and SCP which makes Linux preferrable over Windows). I won't go into the details of yesterday, but just imagine the fun I had with a BIOS that seems to be broken for USB boot, resizing partitions, corrupt MBR's, a machine that refuses to boot even from CDROM, finding out "not enough free space" meant I crossed the limit of 4 primary partitions per MBR based drive, programs crashing while they're partitioning my hard drive, and lots more. However, All's Well That Ends Well and I hope to resume doing more useful work tomorrow or the day after.
2011-11-25
This week's progress
I'll try to meet with Kurt in the beginning of next week.
2011-11-18
This week's progress
- Finished and pretty much rewrote the Queue based on a ConcurrentSkipListMap. Originally this queue used keys that were generated from the events it received (not that quick a task because of the whole ProcessInterface inheritance check in SchedulableMulticoreComparator). These keys then implemented SchedulableInterface so they could be used without rewriting SchedulableMulticoreComparator.
I later realised that it would be much better to use the events as keys themselves and map them to something like an Integer (putting null in as the corresponding value causes problems with the delete operation on the queue...)
- Found a simple SkipList implementation on the internet and transformed it so it can be used (eg make it use a comparator).
- Made the queue used a field in the configuration that cna be changed at runtime. An enumeration is constructed from this field and later passed to a Factory that constructs a Queue used in the core.
- Made a little python script that runs the simulator with different queues.
2011-11-11
This week's progress
Surprisingly enough, after installing the latest JDK6, Eclipse Indigo, Maven and Subversion, actually getting the GES to run was surprisingly easy. I even managed to get the relevant parts in Eclipse without many issues.
I did have some trouble getting VisualVM to run, as evidenced by the previous blogpost. I'm also looking forward to the conversion from svn to hg/git/bzr/anything but svn. I really don't like svn and it doesn't seem to like me either.
Furthermore, I also made an EventQueue interface that's easier to implement than an AbstractQueue and have started thinking about implementing a HashMapEventQueue. I'm still figuring out what values are known when (SchedulableMulticoreComparator is a major influence) for an event, and how to combine them into a unique and sortable key.
Application not visible in VisualVM
The VisualVM FAQ talks about how you can only see java applications that are started by the same user and running on the same VM, so I first uninstalled three of my JVM's (JDK 32bit, JRE 32 bit, JRE 64bit leaving only JDK64 bit) to make absolutely sure the same VM was being used.
After this, still no GES in VisualVM. I used Process Explorer to make sure maven wasn't doing any voodoo (eg. switching users, it would be weird but not unseen), but no, both VisualVM and the GES were running from the same JVM executable nd were started by the same user. Moments later I realised what was going on:
- I was starting VisualVM from explorer.exe
- I was starting mvn from powershell or a cygwin bash
- When starting a java application, the current process is forked and the JVM will run as a child process of the current process. So even though visualVM and the GES were both running from the same executable by the same user, the JVM was not in fact the same one, just one that was similar.
2011-11-04
This week's progress
2011-10-28
Weekly Summary
- Facsimile Simulation Library
- Tortuga
- SimPy
- GTNetS
- SimGrid
- JiST
- Ptolemy II
- OMNeT++
- GridSim
- NS-3
Facsimile Simulation Library
The relevant code is written in the Scala file /libfacsimile/src/main/scala/org/facsim/facsimile/engine/Simulation.scala. It uses a scala.collection.mutable.PriorityQueue which is implemented using a Heap.
Tortuga
EventManager interface.
- A Calendar implementation in
delegate-src/src/org/mitre/sim/event/calendar/CalendarImpl.javathat's based on the original paper but has discouraging performance related comments. - A HashSet implementation in
delegate-src/src/org/mitre/sim/event/hashSet/HashSetImpl.javathat uses Java's built in hashset.The important part (dequeueing) is something like this (GPLv2 licensed, © MITRE corporation):
Suffice it to say that evaluating every single event at every single step of the simulation will not get you good performance.Event result = null; double smallestTime = Double.POSITIVE_INFINITY; for (Iterator i = db.iterator(); i.hasNext(); ) { Event nextEvent = (Event)i.next(); if (nextEvent.getNextTime() < smallestTime) { smallestTime = nextEvent.getNextTime(); result = nextEvent; } } db.remove(result); - A TreeMap based implementation in
delegate-src/src/org/mitre/sim/event/treeMap/TreeMapImpl.java. As you might have guessed, this uses the java.util.TreeMap<K,V>. I'm not sure what kind of tree it uses internally, but if it's not a binary heap, performance should be close.
I also just realised that I apparently switch from Dutch to English and back every a few posts, so eh, sorry about that. I honestly didn't notice.
2011-10-27
SimPy
SimPy/Simulation.py.
2011-10-26
GTNetS
SRC/scheduler-map.{cc,h},
SC/scheduler.{cc,h}). Om een deterministische sortering te
verzekeren, en geen problemen te krijgen met gelijke keys voor verschillende
events wordt er in het geval van een tie in execution slice gekeken naar een
uniek event ID om te weten welk van de events het eerst moet worden uitgevoerd.
SimGrid
Maargoed, SimGrid heeft de mogelijkheid om onder andere NS3 en GTNets (experimenteel) te gebruiken, maar bezit ook over zijn eigen simulatiecore.
Deze core, SURF genaamd, gebruikt een wiskundig model dat niet in een ADT te
omvatten is (zie onder andere src\surf\surf.c meerbepaald functie
surf_solve). Er wordt verder hier en daar wel hier en daar gebruik gemaakt van
een generieke heap (src\xbt\heap.c), maar er zijn geen "fancy"
datastructuren zoals calendar queues te vinden.
2011-10-25
JiST
src/jist/runtime/Scheduler.java. Currently only the
heap is actually used by the simulator; to use the Calendar queue one needs to recompile the source
code after commenting out one line and uncommenting another in src/jist/runtime/Controller.java.
Ptolemy II
OMNeT++
De code is geschreven in C/C++ en is een binaire heap gebaseerd op Handbook of Algorithms and Data Structures, Gaston H. Gonnet, pp. 273-274. De code is terug te vinden in src/sim/cmsgheap.cc.
GridSim
GridSim gebruikt van het gedateerde (anno 1995) SimJava2 als zijn DES. Deze maakt op zijn beurt gebruik van 2 verschillende queues.
De eerste queue is die waarin toekomstige gebeurtenissen terecht komen. Deze is geïmplementeerd
als een standaard java.util.TreeSet.
De tweede queue is de queue waarin uitgestelde gebeurtenissen geplaatst worden. Deze is op zijn beurt geïmplementeerd als een java.util.LinkedList.
Veel valt hier niet over te zeggen.
NS-3
Er zijn verschillende queues geimplementeerd, allen in C++. Voor zover ik kan zien
geen fancy automatisch switching tussen queue implementaties ofzo. De queue
kan gekozen worden met --SchedulerType=MyTypeId command line
switch, default is de std::map queue.
- Calendar Queue ( src/core/model/calendar-scheduler.{cc,h} ). Implementatie
van het originele "Calendar Queues: A Fast O(1) Priority Queue Implementation for
the Simulation Event Set Problem", Randy Brown met originele policy. Merk op: veel trager dan verwacht door de auteur, waarschijnlijk door slechte resizing policies. Een
stuk trager dan bv de std::map queue.
- Heap queue ( src/core/model/heap-scheduler.{cc,h} ).
Straightforward binaire heap.
- std::list queue ( src/core/model/list-scheduler.{cc,h} ).
Triviale implementatie.
- std::map queue ( src/core/model/map-scheduler.{cc,h} ).
Gebaseerd op een idee in de Georgia Tech Network Simulator (GTNetS). Dit
vereist wel dat er op de keys een absolute ordering zit (zonder
duplicates).
- NS-2 Calendar Queue (
src/core/model/ns2-calendar-scheduler.{cc,h} ). De NS-2 Calendar
Scheduler die iets vager is dan de NS-3 Scheduler.
- Realtime Scheduler (
src/core/model/realtime-simulator-impl.{cc,h} en
src/core/model/wall-clock-synchronizer.{cc,h} ). Zie
http://www.nsnam.org/docs/manual/html/realtime.html. Minder relevant voor
onze doeleinden afaik.
Evaluatie van verschillende Discrete Event Simulatoren
2011-10-14
On this blog...
Stay tuned.