[HN Gopher] Our recent server issues ___________________________________________________________________ Our recent server issues Author : timetraveller26 Score : 94 points Date : 2021-12-22 19:06 UTC (3 hours ago) (HTM) web link (lichess.org) (TXT) w3m dump (lichess.org) | than3 wrote: | I expect that what they've pointed to as the cause is only part | of the problem. We'll never know the full picture unless they | share it. | | Being upfront, my experience of the people in charge of the | organization there doesn't have much goodwill left. Nothing | against Thibault personally, I think he's done some great things | but seems to be busy with whatever he's interested in and | management isn't it, and he has some unprofessional people with | access that work for the project/charity. | | I had volunteered my services years ago as a System Administrator | (no charge) with suggestions, but with limited modes of | communication, multiple issues going stale, no response with an | auto issue closed. They have issues that don't get addressed, and | their process doesn't appare to be aimed to cultivate qualified | volunteers or improve the bus factor of the project. | | To make things worse, when I expressed disagreement with | constructive feedback regarding one of their process decisions, | one of the other dev's with access apparently took offense and | the next day I found my lichess account had been edited by an | admin without notice or notification. I could not log in (wrong | password), the password and email for password recovery were | changed, and trying to access the profile URI directly showed the | account as banned. | | It seemed this was done out of spite, and definitely without any | kind of due process. Appeals by email went unanswered within the | 90 day cutoff I gave them. As a result I submitted a complaint to | the french charities regulatory body and moved on since the group | wasn't worth wasting any more of my time. I haven't heard back so | who knows if anything came of what I reported. | | In my opinion, they've got more internal problems than they let | on, and to me this is just spillover. | | Its unfortunate because any failure like this impacts so many | people, but I don't find it surprising given my limited | experience of the people there. | schaefer wrote: | Would you consider reading the excellent book "Working in | Public: The Making and Maintenance of Open Source Software" by | Nadia Eghbal? | | The data behind what "Open Source" projects look like differs | from the popular culture narratives and assumptions about what | they _should_ look. | | I doubt there was justification for disabling your player | account. and I'm sorry that happened to you. | | But it sounds like your expectations about code contributions, | and onboarding new volunteers may have been far from that | project's reality. | Shadonototra wrote: | lichess is open source, if you want to contribute send your PR | here: https://github.com/ornicar/lila | | i don't know what's your motive, but it doesn't seems to | involve lichess's code ;) | iliekcomputers wrote: | I'm not sure I completely understand. They say that the only | thing that was affected was the tournament because its events | needed to be processed synchronously, but I remember the entire | site being unavailable for people. Was that unrelated? | | On a side note, huge props to Lichess, the fact that they can | compete with chess.com which has so many resources behind it is | very impressive. Everyone who plays chess should consider | becoming a patron. | Santosh83 wrote: | During the first crash immediately after the initial start of | the tournament, the entire site did indeed go offline for a few | minutes. Even address resolution failed. Then things went | smoothly for about an hour after which the 2nd crash came. This | one just seemed to affect the particular tournament | (participants couldn't get fresh pairings) while the rest of | the site was still working, as the article mentions. | iliekcomputers wrote: | Ah I see. That makes sense. | | Would be interesting to know what caused the entire site to | go down in the beginning. Wonder if it was just too much | traffic. | nijave wrote: | Anyone take a look into the code and see why it can't be | parallelized? The bottom of the FAQ mentions that but I'd think | at least certain aspects should be parallelizable or at least be | prioritizable (like maybe forgoing leaderboard updates to focus | on more important events?) | jb_s wrote: | Good idea. Tho in the code itself (from a cursory glance) in | L141 _Sequencing(...)_ I see a bunch of nested maps - I don 't | know Scala but I think this may be a performance issue? Rather | than hyperfocusing on parallelism or event systems etc since | that stuff is comparitively hard to solve maybe refactoring | this function/algo at the core of the pairing would have more | bang for buck | | https://github.com/ornicar/lila/blob/98691c8901cc0e7d0f338f4... | jeremyjh wrote: | If you think about it, you can't really generate pairings in | parallel because each thread would need write access to the | entire pool to ensure no one is paired twice and that you have | a consistent view of all the results to that point before | creating a new pair. You could maybe create a lock for each | participant but that might actually be slower, and would | definitely be more difficult to reason about and can lead | towards bugs just as serious under load. | bo1024 wrote: | It would be very interesting to hear about the technical details! | EarthIsHome wrote: | Further down in the article, there's a more technical | explanation under the heading "Can you elaborate on the | technical issue?" | jph wrote: | > Eventually there was no way of keeping up with the queue | | A chess congestion pile up... sounds like an event stream rook-ie | mistake. :-) | | Seriously congrats to Lichess for growing. It's an amazing site. | Donate if you can. | bryan0 wrote: | Hikaru was live streaming the event so you can see the series of | failures and how they affected the tournament here: | https://youtu.be/YKfvNl8UoxA | powera wrote: | The "easy" solution is to put a cap on tournament size. | | Apart from when Agadmator wants his fans to be in the same | tournament as Magnus Carlsen etc., there is basically no need to | hold chess tournaments with over 1000 players. | assbuttbuttass wrote: | Sounds like they need backpressure. Isn't that the usual solution | to a queue growing without bound? | jeremyjh wrote: | You mean you want to pause the games that are in progress? The | events are created by people finishing their games and queuing | for the next pair. You could limit total participants but only | if you know the limit ahead of time. Nothing else makes sense. | c0balt wrote: | Load sheddding might also be a good solution | progbits wrote: | How would that work here? | | "Load" is players finishing games and requiring new match | pairings / rating updates for the tournament to continue. You | can tell them to wait, sure. But as long as the rate of games | finishing exceeds the rate at which they can process them I | don't see how that would improve the situation, the | tournament would be stuck anyway. | | One option is to eg. limit the total number of players in the | tournament up front but they explicitly said they didn't want | to do that. | bcrosby95 wrote: | Depends upon of every event is strictly necessary for the | proper functioning of the system, or if some are just nice- | to-have. | zaptheimpaler wrote: | lol are you all just repeating scaling buzzwords? back- | pressure, load shedding, whats next horizontal scaling or | maybe blockchain? | jeremyjh wrote: | This is where it helps to have domain knowledge before | pontificating on someone else's architecture. It is an | arena tournament. If you want people to join the | tournament and get paired with games after finishing | their current game you'll need those events. There are no | "nice to haves". ___________________________________________________________________ (page generated 2021-12-22 23:00 UTC)