[HN Gopher] Our recent server issues
       ___________________________________________________________________
        
       Our recent server issues
        
       Author : timetraveller26
       Score  : 94 points
       Date   : 2021-12-22 19:06 UTC (3 hours ago)
        
 (HTM) web link (lichess.org)
 (TXT) w3m dump (lichess.org)
        
       | than3 wrote:
       | I expect that what they've pointed to as the cause is only part
       | of the problem. We'll never know the full picture unless they
       | share it.
       | 
       | Being upfront, my experience of the people in charge of the
       | organization there doesn't have much goodwill left. Nothing
       | against Thibault personally, I think he's done some great things
       | but seems to be busy with whatever he's interested in and
       | management isn't it, and he has some unprofessional people with
       | access that work for the project/charity.
       | 
       | I had volunteered my services years ago as a System Administrator
       | (no charge) with suggestions, but with limited modes of
       | communication, multiple issues going stale, no response with an
       | auto issue closed. They have issues that don't get addressed, and
       | their process doesn't appare to be aimed to cultivate qualified
       | volunteers or improve the bus factor of the project.
       | 
       | To make things worse, when I expressed disagreement with
       | constructive feedback regarding one of their process decisions,
       | one of the other dev's with access apparently took offense and
       | the next day I found my lichess account had been edited by an
       | admin without notice or notification. I could not log in (wrong
       | password), the password and email for password recovery were
       | changed, and trying to access the profile URI directly showed the
       | account as banned.
       | 
       | It seemed this was done out of spite, and definitely without any
       | kind of due process. Appeals by email went unanswered within the
       | 90 day cutoff I gave them. As a result I submitted a complaint to
       | the french charities regulatory body and moved on since the group
       | wasn't worth wasting any more of my time. I haven't heard back so
       | who knows if anything came of what I reported.
       | 
       | In my opinion, they've got more internal problems than they let
       | on, and to me this is just spillover.
       | 
       | Its unfortunate because any failure like this impacts so many
       | people, but I don't find it surprising given my limited
       | experience of the people there.
        
         | schaefer wrote:
         | Would you consider reading the excellent book "Working in
         | Public: The Making and Maintenance of Open Source Software" by
         | Nadia Eghbal?
         | 
         | The data behind what "Open Source" projects look like differs
         | from the popular culture narratives and assumptions about what
         | they _should_ look.
         | 
         | I doubt there was justification for disabling your player
         | account. and I'm sorry that happened to you.
         | 
         | But it sounds like your expectations about code contributions,
         | and onboarding new volunteers may have been far from that
         | project's reality.
        
         | Shadonototra wrote:
         | lichess is open source, if you want to contribute send your PR
         | here: https://github.com/ornicar/lila
         | 
         | i don't know what's your motive, but it doesn't seems to
         | involve lichess's code ;)
        
       | iliekcomputers wrote:
       | I'm not sure I completely understand. They say that the only
       | thing that was affected was the tournament because its events
       | needed to be processed synchronously, but I remember the entire
       | site being unavailable for people. Was that unrelated?
       | 
       | On a side note, huge props to Lichess, the fact that they can
       | compete with chess.com which has so many resources behind it is
       | very impressive. Everyone who plays chess should consider
       | becoming a patron.
        
         | Santosh83 wrote:
         | During the first crash immediately after the initial start of
         | the tournament, the entire site did indeed go offline for a few
         | minutes. Even address resolution failed. Then things went
         | smoothly for about an hour after which the 2nd crash came. This
         | one just seemed to affect the particular tournament
         | (participants couldn't get fresh pairings) while the rest of
         | the site was still working, as the article mentions.
        
           | iliekcomputers wrote:
           | Ah I see. That makes sense.
           | 
           | Would be interesting to know what caused the entire site to
           | go down in the beginning. Wonder if it was just too much
           | traffic.
        
       | nijave wrote:
       | Anyone take a look into the code and see why it can't be
       | parallelized? The bottom of the FAQ mentions that but I'd think
       | at least certain aspects should be parallelizable or at least be
       | prioritizable (like maybe forgoing leaderboard updates to focus
       | on more important events?)
        
         | jb_s wrote:
         | Good idea. Tho in the code itself (from a cursory glance) in
         | L141 _Sequencing(...)_ I see a bunch of nested maps - I don 't
         | know Scala but I think this may be a performance issue? Rather
         | than hyperfocusing on parallelism or event systems etc since
         | that stuff is comparitively hard to solve maybe refactoring
         | this function/algo at the core of the pairing would have more
         | bang for buck
         | 
         | https://github.com/ornicar/lila/blob/98691c8901cc0e7d0f338f4...
        
         | jeremyjh wrote:
         | If you think about it, you can't really generate pairings in
         | parallel because each thread would need write access to the
         | entire pool to ensure no one is paired twice and that you have
         | a consistent view of all the results to that point before
         | creating a new pair. You could maybe create a lock for each
         | participant but that might actually be slower, and would
         | definitely be more difficult to reason about and can lead
         | towards bugs just as serious under load.
        
       | bo1024 wrote:
       | It would be very interesting to hear about the technical details!
        
         | EarthIsHome wrote:
         | Further down in the article, there's a more technical
         | explanation under the heading "Can you elaborate on the
         | technical issue?"
        
       | jph wrote:
       | > Eventually there was no way of keeping up with the queue
       | 
       | A chess congestion pile up... sounds like an event stream rook-ie
       | mistake. :-)
       | 
       | Seriously congrats to Lichess for growing. It's an amazing site.
       | Donate if you can.
        
       | bryan0 wrote:
       | Hikaru was live streaming the event so you can see the series of
       | failures and how they affected the tournament here:
       | https://youtu.be/YKfvNl8UoxA
        
       | powera wrote:
       | The "easy" solution is to put a cap on tournament size.
       | 
       | Apart from when Agadmator wants his fans to be in the same
       | tournament as Magnus Carlsen etc., there is basically no need to
       | hold chess tournaments with over 1000 players.
        
       | assbuttbuttass wrote:
       | Sounds like they need backpressure. Isn't that the usual solution
       | to a queue growing without bound?
        
         | jeremyjh wrote:
         | You mean you want to pause the games that are in progress? The
         | events are created by people finishing their games and queuing
         | for the next pair. You could limit total participants but only
         | if you know the limit ahead of time. Nothing else makes sense.
        
         | c0balt wrote:
         | Load sheddding might also be a good solution
        
           | progbits wrote:
           | How would that work here?
           | 
           | "Load" is players finishing games and requiring new match
           | pairings / rating updates for the tournament to continue. You
           | can tell them to wait, sure. But as long as the rate of games
           | finishing exceeds the rate at which they can process them I
           | don't see how that would improve the situation, the
           | tournament would be stuck anyway.
           | 
           | One option is to eg. limit the total number of players in the
           | tournament up front but they explicitly said they didn't want
           | to do that.
        
             | bcrosby95 wrote:
             | Depends upon of every event is strictly necessary for the
             | proper functioning of the system, or if some are just nice-
             | to-have.
        
               | zaptheimpaler wrote:
               | lol are you all just repeating scaling buzzwords? back-
               | pressure, load shedding, whats next horizontal scaling or
               | maybe blockchain?
        
               | jeremyjh wrote:
               | This is where it helps to have domain knowledge before
               | pontificating on someone else's architecture. It is an
               | arena tournament. If you want people to join the
               | tournament and get paired with games after finishing
               | their current game you'll need those events. There are no
               | "nice to haves".
        
       ___________________________________________________________________
       (page generated 2021-12-22 23:00 UTC)