the funny thing about the toolchain is not that everything depends on it, but that certain packages are built twice before being released (though, I have to admit, I'm not 100% sure, why we need that)
Circular dependencies, I would assume.
Yes, assuming you're using the package dependencies, most things won't depend explicitly on the toolchain, as that only really documents runtime dependencies in the main. Presumably you have to have some kind of build dependencies list or logic to ensure the relevant toolchain gets built before the package that needs it to be built?
regarding locks: we currently lock on file level for some of the scripts - e.g. when a sanity check runs, noone should write to the package database, otherwise the sanity check will return false positives. However, some of the locks we have seem to be overcautious - but otoh we had problems with the database in the past when we relied on proper locking of stuff by the database (e.g. when we allowed simultaneously returning packages).
I'd much prefer locks to be overcautious than under, so long as they don't deadlock ever. I'm having a little trouble seeing why new locks are needed here to be honest, perhaps I just need to re-read the OP, but for my sake at least if you could capture a use case where the tools should lock that they aren't doing currently that would help. Seems to me that before the required toolchain has been built, the can be built queue should be empty other than some bits of toolchain, so as long as locking currently stops queue adds and queue SELECTs and other classes of dangerous SQL then that's okay.
]]>regarding locks: we currently lock on file level for some of the scripts - e.g. when a sanity check runs, noone should write to the package database, otherwise the sanity check will return false positives. However, some of the locks we have seem to be overcautious - but otoh we had problems with the database in the past when we relied on proper locking of stuff by the database (e.g. when we allowed simultanously returning packages).
]]>I'm less clear on the locking situation. There seem to be locks already implemented, but I'm not sure exactly what else you need.
]]>I'd like to brainstorm some ideas to improve performance on that part.
Which steps it currently performs:
trivial/fast tasks: return assignments which are already scheduled for that slave or which are manually forced onto that slave (not to be confused with "prefered by that slave")
create temporary list of buildable packages (e.g. ones where all dependencies are met and which fit the architecture of the build slave)
remove all "wrong" packages from that list (packages that are currently being built (by another build slave), all not-toolchain packages iff a toolchain package is on that list)
order that list by different criteria (e.g. toolchain-build-order, priority, architecture, commit time, previous errors/build trials, etc.)
hand out the first package of that (ordered) list
proposed improvement (ideas by abaumann):
make the temporary table permanent
hand out the k-th build assignment to the k-th slave
there are some problems I see arising with this:
How do we handle rebuilds of the toolchain? All other packages should be blocked if the toolchain is being rebuilt. Also we should always only build one toolchain package per architecture simultanously. Ordering will also be interesting ...
When do we update the permanent table? on return of packages? Then we need a lock or something ...
The pressure to optimize get-assignment is not that high, it currently takes around 30 seconds to complete - which is not that high, but optimizeable.
In contrast, it is very easy to break stuff there and I invested already much time into getting the scheduling logic right.