[HN Gopher] Programming Language Memory Models
       ___________________________________________________________________
        
       Programming Language Memory Models
        
       Author : thinxer
       Score  : 94 points
       Date   : 2021-07-06 16:22 UTC (6 hours ago)
        
 (HTM) web link (research.swtch.com)
 (TXT) w3m dump (research.swtch.com)
        
       | raphlinus wrote:
       | A GPU followup to this article.
       | 
       | While on CPU sequentially consistent semantics are efficient to
       | implement, that seems to be much less true on GPU. Thus, Vulkan
       | completely eliminates sequential consistency and provides only
       | acquire/release semantics[1].
       | 
       | It is extremely difficult to reason about programs using these
       | advanced memory semantics. For example, there is a discussion
       | about whether a spinlock implemented in terms of acquire and
       | release can be reordered in a way to introduce deadlock (see
       | reddit discussion linked from [2]). I was curious enough about
       | this I tried to model it in CDSChecker, but did not get
       | definitive results (the deadlock checker in that tool is enabled
       | for mutexes provided by API, but not for mutexes built out of
       | primitives). I'll also note that using AcqRel semantics is not
       | provided by the Rust version of compare_exchange_weak (perhaps a
       | nit on TFA's assertion that Rust adopts the C++ memory model
       | wholesale), so if acquire to lock the spinlock is not adequate,
       | it's likely it would need to go to SeqCst.
       | 
       | Thus, I find myself quite unsure whether this kind of spinlock
       | would work on Vulkan or would be prone to deadlock. It's also
       | possible it could be fixed by putting a release barrier before
       | the lock loop.
       | 
       | We have some serious experts on HN, so hopefully someone who
       | knows the answer can enlighten us - mixed in of course with all
       | the confidently wrong assertions that inevitably pop up in
       | discussions about memory model semantics.
       | 
       | [1]: https://www.khronos.org/blog/comparing-the-vulkan-spir-v-
       | mem...
       | 
       | [2]: https://rigtorp.se/spinlock/
        
         | raphlinus wrote:
         | Also: it remains difficult to fully nail down the semantics of
         | sequential consistency as well, especially when it's mixed with
         | other memory semantics. Very likely next time Russ updates his
         | article he should add a reference to Repairing Sequential
         | Consistency in C/C++11[1].
         | 
         | [1]: https://plv.mpi-sws.org/scfix/full.pdf
        
         | dragontamer wrote:
         | GPU-spinlocks are a bad idea, unless the spinlock is applied
         | over the entire Thread-group.
         | 
         | Even then, I'm pretty sure the spinlock is a bad idea, because
         | you probably should be using GPUs as a coprocessor and
         | enforcing "orderings" over CUDA-Streams or OpenCL Task Graphs.
         | The kernel-spawn and kernel-end mechanism provides you your
         | synchronization functionality ("happens-before") when you need
         | it.
         | 
         | ---------
         | 
         | From there on out: the GPU-low level synchronization of choice
         | is the thread-barrier (which can extend out beyond a wavefront,
         | but only up to a block).
         | 
         | --------
         | 
         | So that'd be my advice: use a thread-barrier at the lowest
         | level for thread blocks (synchronization between 1024 threads
         | and below). And use kernel-start / kernel-end graphs (aka: CUDA
         | Stream and/or OpenCL Task Graphs) for synchronizing groups of
         | more than 1024 threads together.
         | 
         | Otherwise, I've done some experiments with acquire/release and
         | basic lock/unlock mechanisms. They seem to work as expected.
         | You get deadlocks immediately on older hardware because of the
         | implicit SIMD-execution (so you want only thread#0 or active-
         | thread#0 to perform the lock for the whole wavefront / thread
         | block). You'll still want to use thread-barriers for higher
         | performance synchronization.
         | 
         | Frankly, I'm not exactly sure why you'd want to use a spinlock
         | since thread-barriers are simply higher performance in the GPU
         | world.
        
           | raphlinus wrote:
           | In _general_ spinlocks are a bad idea, but you do see them in
           | contexts like decoupled look-back. As you say, thread
           | granularity is a problem (unless you 're on CUDA on Volta+
           | hardware, which has independent thread scheduling), so you
           | want threadgroup or workgroup granularity.
           | 
           | In any case, I'm interested in pushing the boundaries of
           | lock-free algorithms. It is of course easy to reason about
           | kernel-{start/end} synchronization, but the granularity may
           | be too coarse for some interesting applications.
        
             | dragontamer wrote:
             | This is the first time I've heard of the term "decoupled
             | look-back". But I see that it refers to CUB's
             | implementation of device-wide scan.
             | 
             | I briefly looked at the code, and came across: https://gith
             | ub.com/NVIDIA/cub/blob/main/cub/agent/agent_scan...
             | 
             | I'm seeing lots of calls to "CTA_SYNC()", which ends up
             | being just a "__syncthreads" (a simple thread-barrier).
             | See: https://github.com/NVIDIA/cub/blob/a8910accebe74ce043a
             | 13026f...
             | 
             | I admit that I'm looking rather quickly though, but... I'm
             | not exactly seeing where this mysterious "spinlock" is that
             | you're talking about. I haven't tried very hard yet but
             | maybe you can point out what code exactly in this
             | device_scan / decoupled look-back uses a spinlock? Cause
             | I'm just not seeing it.
             | 
             | ----------
             | 
             | And of course: a call to cub's "device scan" is innately
             | ordered to kernel-start / kernel-end. So there's your
             | synchronization mechanism right there and then.
        
               | raphlinus wrote:
               | I don't think CUB is doing decoupled look-back, the
               | reference you want is:
               | https://research.nvidia.com/publication/single-pass-
               | parallel...
               | 
               | It doesn't use the word "spin" but repeated polling (step
               | 4 in the algorithm presented in section 4.1, particularly
               | when the flag is X) is basically the same.
        
               | dragontamer wrote:
               | 3rd paragraph:
               | 
               | > In this report, we describe the decoupled-lookback
               | method of single-pass parallel prefix scan and its
               | implementation within the open-source CUB library of GPU
               | parallel primitives
               | 
               | The CUB-library also states:
               | 
               | https://nvlabs.github.io/cub/structcub_1_1_device_scan.ht
               | ml
               | 
               | >> As of CUB 1.0.1 (2013), CUB's device-wide scan APIs
               | have implemented our "decoupled look-back" algorithm for
               | performing global prefix scan with only a single pass
               | through the input data, as described in our 2016
               | technical report [1]
               | 
               | Where [1] is a footnote pointing at the exact paper you
               | just linked.
               | 
               | -----------
               | 
               | > It doesn't use the word "spin" but repeated polling
               | (step 4 in the algorithm presented in section 4.1,
               | particularly when the flag is X) is basically the same.
               | 
               | That certainly sounds spinlock-ish. At least that gives
               | me what to look for in the code.
        
               | raphlinus wrote:
               | Ah, good then, I didn't know that, thanks for the cite. I
               | haven't studied the CUB code on it carefully.
        
       | knz42 wrote:
       | A lot of the complexity comes from the lack of expressivity in
       | languages to relate variables (or data structure fields)
       | semantically to each other. If there was a way to tell the
       | compiler "these variables are always accessed in tandem", the
       | compiler could be smart about ordering and memory fences.
       | 
       | The idea to extend programming languages and type systems in that
       | direction is not new: folk who've been using distributed
       | computing for computations have to think about this already, and
       | could teach a few things to folk who use shared memory multi-
       | processors.
       | 
       | Here's an idea for ISA primitives that could help a language
       | group variables together: bind/propagate operators on
       | (combinations of) address ranges.
       | https://pure.uva.nl/ws/files/1813114/109501_19.pdf
        
         | smasher164 wrote:
         | Even with that expressivity, someone who incorrectly relates or
         | forgets to relate two variables could experience the same
         | issues. It's still important to address what happens when the
         | program has data races or when it is data-race-free but the
         | memory model permits overreaching optimizations. The language
         | and implementation should strive to make a program
         | approximately correct.
        
         | dragontamer wrote:
         | That's Java's Object.lock() mechanism.
         | 
         | All variables inside of an object (aka: any class) are assumed
         | to be related to each other. synchronized(foobar_object){
         | baz(); } ensures that all uses of foobar_object inside the
         | synchronization{} area are sequential (and therefore correct).
         | 
         | --------
         | 
         | The issue is that some people (a minority) are interested in
         | "beating locks" and making something even more efficient.
        
           | karmakaze wrote:
           | In Java, any object can be used to synchronize any data, e.g.
           | synchronized(foobar_object){ foo(); }
           | synchronized(foobar_object){ bar(); }
           | synchronized(foobar_object){ baz(); }
           | 
           | Will have foo, bar, baz methods well behaved in any data that
           | they share regardless of whether they are foobar methods or
           | methods of any other class(es). It is exactly analogous to
           | the S(a) -> S(a) synchronizing instruction from the article
           | that establishes a happens-before partitioning each thread
           | into before/after the S(a).
           | 
           | The only time synchronized(explicit_object) relates to
           | anything else is when also using the keyword where
           | `synchronized void foo()` is equivalent (with a minor
           | performance difference) to `synchronized(this) { ... }`
           | wrapping the entire body of the foo method.
        
       | voidnullnil wrote:
       | These "memory models" are too complex for languages intended for
       | dilettante developers. It was a disaster in Java/C#. Not even
       | more than a handful of programmers in existence know in depth how
       | it works, as in, can they understand any given trivial program in
       | their language. At best they only know some vague stuff like that
       | locking prevents any non visibility issues. It goes far deeper
       | than that though (which is also the fault of complex language
       | designs like Java and C#).
       | 
       | The common programmer does not understand that you've just
       | transformed their program - for which they were taught merely
       | that multiple threads needs synchronization - into a new game,
       | which has an entire separate specification, where every shared
       | variable obeys a set of abstruse rules revolving around the
       | happens-before relationship. Locks, mutexes, atomic variables are
       | all one thing. Fences are a completely different thing. At least
       | in the way most people intuit programs to work.
       | 
       | Go tries to appeal to programmers as consumers (that is, when
       | given a choice between cleaner design and pleasing the user who
       | just wants to "get stuff done", they choose the latter), but yet
       | also adds in traditional complexities like this. Yes, there is
       | performance trade off to having shared memory behave intuitively,
       | but that's much better than bugs that 99% of your CHOSEN userbase
       | do not know how to avoid. Also remember Go has lots of weird edge
       | cases, like sharing a slice across threads can lead to memory
       | corruption (in the C / assembly sense, not merely within that
       | array) despite the rest of the language being memory-safe.
       | Multiply that by the "memory model".
       | 
       | Edit: forgot spaces between paragraphs.
        
         | filleduchaos wrote:
         | > These "memory models" are too complex for languages intended
         | for dilettante developers.
         | 
         | It would be nice if sometime we stopped pretending that
         | beginners are too slow to know/understand things and instead
         | faced the fact that their instructors and mentors are bad at
         | teaching.
        
         | temac wrote:
         | They are not too complex for "languages intended for dilettante
         | developers": they can (and should) use sequential consistency
         | or locks everywhere.
        
           | voidnullnil wrote:
           | How many programmers do you know put locks around polling
           | varibles for seemingly no reason (as they are not cognizant
           | of the memory model)?
        
             | pjmlp wrote:
             | Well, while it may appear gatekeeping, maybe those
             | dilettante developers should be using something else
             | instead, like BASIC?
        
               | formerly_proven wrote:
               | CPython has a pleasant memory model from what I've heard
               | :)
        
       | electricshampo1 wrote:
       | "Java and JavaScript have avoided introducing weak
       | (acquire/release) synchronizing atomics, which seem tailored for
       | x86."
       | 
       | This is not true for Java; see
       | 
       | http://gee.cs.oswego.edu/dl/html/j9mm.html
       | 
       | https://docs.oracle.com/en/java/javase/16/docs/api/java.base...
        
         | dragontamer wrote:
         | Its not true in general. x86 CANNOT have weak acquire/release
         | semantics. x86 is "too strong", you get total-store ordering by
         | default.
         | 
         | If you want to test out weaker acquire/release semantics, you
         | need to buy an ARM or POWER9 processor.
        
       | bullen wrote:
       | In a 100 years the main languages used will still be C on the
       | client (with a C++ compiler) and Java on the server.
       | 
       | Go has no VM but it has a GC. WASM has a VM but no GC.
       | 
       | Eveything has been tried and Java still kicks everythings ass to
       | the moon on the server.
       | 
       | Fragmentation is bad, lets stop using bad languages and focus on
       | the products we build instead.
       | 
       | "While I'm on the topic of concurrency I should mention my far
       | too brief chat with Doug Lea. He commented that multi-threaded
       | Java these days far outperforms C, due to the memory management
       | and a garbage collector. If I recall correctly he said "only 12
       | times faster than C means you haven't started optimizing"." -
       | Martin Fowler https://martinfowler.com/bliki/OOPSLA2005.html
       | 
       | "Many lock-free structures offer atomic-free read paths, notably
       | concurrent containers in garbage collected languages, such as
       | ConcurrentHashMap in Java. Languages without garbage collection
       | have fewer straightforward options, mostly because safe memory
       | reclamation is a hard problem..." - Travis Downs
       | https://travisdowns.github.io/blog/2020/07/06/concurrency-co...
        
         | pphysch wrote:
         | Pretty sure Java is/was popular (and has enormous momentum)
         | because "it just works" at a time when the Internet is taking
         | off, not because it is some linguistic or technological marvel.
         | It will definitely stick around, just like COBOL and C and Go.
        
           | bullen wrote:
           | COBOL is not around in anything interesting, and trust me go
           | is not going to be used to build anything that we'll use in
           | 20 years.
        
             | pphysch wrote:
             | Do I hear "containers are a temporary fad"?
        
               | bullen wrote:
               | Yep
        
         | skohan wrote:
         | I'm sorry is this comment from 1998? I've been working in
         | software for over a decade, and I haven't seen server work
         | being done in Java in ages.
         | 
         | From my perspective, Go in the context of serverless
         | programming seems to currently be the best choice for server-
         | side programming.
         | 
         | In the next 20 years I expect Go will be supplanted by a
         | language which is a lot like go (automatic memory management,
         | simple, easy to learn & write and performant enough) but with
         | the addition of algebraic data types, named parameters, and a
         | _slightly_ higher level of abstraction.
        
           | pjmlp wrote:
           | And I haven't seen server work done in C++ since 2006, we
           | keep replacing those systems with Java and .NET ones.
           | 
           | To each its own.
        
           | bullen wrote:
           | Infinite growth does not exist, everything peaks at some
           | point. You have to wonder if a memory model from 2005 still
           | kicks go's ass in 2021 how your your prediction that "there
           | will always be something new and shiny to distract us from
           | the focus we need to leverage the real value of the internet"
           | will play out?
           | 
           | What have you built with go that is interesting?
        
             | skohan wrote:
             | It's not about new and shiny. Programming is still a
             | relatively new field when compared to other fields. For
             | instance, mathematics and physics took centuries to land on
             | the right way to formalize things.
             | 
             | C is maybe the only good programming language invented so
             | far. Java was a failed attempt at improving C. I think
             | we're rapidly converging on the second good programming
             | language, and it's not going to have null pointer
             | exceptions.
        
           | valbaca wrote:
           | > In the next 20 years I expect Go will be supplanted by a
           | language which is a lot like go (automatic memory management,
           | simple, easy to learn & write and performant enough) but with
           | the addition of algebraic data types, named parameters, and a
           | slightly higher level of abstraction.
           | 
           | I'd love for this to be Crystal: https://crystal-lang.org/
           | 
           | > I haven't seen server work being done in Java in ages.
           | 
           | In the meantime, I've been doing a large amount of Java
           | backend server work for the past 10 years.
        
             | skohan wrote:
             | I have a feeling it's going to have C-like syntax and
             | frankly I hope so because using an `end` keyword instead of
             | braces makes no sense to me.
        
               | filleduchaos wrote:
               | Arguably using curly braces to delineate blocks makes no
               | inherent sense either. We just do it because that's what
               | everybody else does.
        
       | jqpabc123 wrote:
       | _If thread 2 copies done into a register before thread 1
       | executes, it may keep using that register for the entire loop,
       | never noticing that thread 1 later modifies done._
       | 
       | Alternative solution: Forget all the "atomic" semantics and
       | simply avoid "optimization" of global variables. Access to any
       | global variable should always occur direct from memory. Sure,
       | this will be less than optimal in some cases but such is the
       | price of using globals. Their use should be discouraged anyway.
       | 
       | In other words, make "atomic" the sensible and logical default
       | with globals. Assignment is an "atomic" operation, just don't
       | circumvent it by using a local copy as an "optimization".
        
         | dragontamer wrote:
         | Removing optimizations on "global" variables will leave the bug
         | in Singleton objects (which are very similar to global
         | variables, but the compiler doesn't know that they're global)
         | 
         | ---------
         | 
         | "Volatile" is close but not good enough semantically to
         | describe what we want. That's why these new Atomic-variables
         | are being declared with seqcst semantics (be it in Java, C++,
         | C, or whatever you program in).
         | 
         | That's the thing: we need a new class of variables that wasn't
         | known 20 years ago. Variables that follow the sequential-
         | consistency requirements, for this use case.
         | 
         | ---------
         | 
         | Note: on ARM systems, even if the compiler doesn't mess up, the
         | L1 cache could mess you up. ARM has multiple load/store
         | semantics available. If you have relaxed (default) semantics on
         | a load, it may be on a "stale" value from DDR4.
         | 
         | That is to say: your loop may load a value into L1 cache, then
         | your core will read the variable over and over from L1 cache
         | (not realizing that L3 cache has been updated to a new value).
         | Not only does your compiler need to know to "not store the
         | value in a register", the memory-subsystem also needs to know
         | to read the data from L3 cache over-and-over again (never using
         | L1 cache).
         | 
         | Rearranging loads/stores on x86 is not allowed in this manner.
         | But ARM is more "Relaxed" than x86. If reading the "Freshest"
         | value is important, you need to have the appropriate memory
         | barriers on ARM (or PowerPC).
        
           | jqpabc123 wrote:
           | _which are very similar to global variables, but the compiler
           | doesn 't know that they're global_
           | 
           | Since as you say, they are very similar, wouldn't it be
           | reasonable to assume for access purposes that they are
           | effectively global?
        
             | dragontamer wrote:
             | Lets do this example in Java (but it should be simple
             | enough that C#, Python, Javascript and other programmers
             | would understand it).                   public void
             | myFunction(FooObject o){             o.doSomething();
             | }
             | 
             | How does the compiler know if "FooObject o" is a singleton
             | or not? That's the thing about the "Singleton" pattern, you
             | have an effective "global-ish" variable, but all of your
             | code is written with normal pass-the-object style.
             | 
             | EDIT: If you're not aware, the way this works is that you
             | call myFunction(getTheSingleton());, where
             | "getTheSingleton()" fetches the Singleton object.
             | myFunction() has no way of "knowing" its actually
             | interacting with the singleton. This is a useful pattern,
             | because you can create unit-tests over the "global"
             | variable by simply mocking out the Singleton for a mock
             | object (or maybe an object with preset state for better
             | unit testing). Among other benefits (but also similar
             | downsides to using a global variable: difficult to reason
             | because you have this "shared state" being used all over
             | the place)
        
               | mumblemumble wrote:
               | Is there anything special about singletons here?
               | 
               | In Java, there's no real difference between a singleton
               | and any other object. A singleton is an object that just
               | happens to have a single instance. Practically speaking,
               | they're typically used as a clever design pattern to
               | "work around" Java's lack of language-level support for
               | global variables, so there's that. But I think that that
               | fact might not be relevant to the issue at hand?
               | 
               | The more basic issue is, if you have two different
               | threads concurrently executing `myFunction`, what happens
               | when they're both operating on the same instance of
               | `FooObject`?
        
               | dragontamer wrote:
               | > Is there anything special about singletons here?
               | 
               | No, aside from the fact that the root commenter clearly
               | understands the issue with global variables, but not
               | necessarily singletons.
               | 
               | I'm trying to use the singleton concept as a "teaching
               | bridge" moment, as the Singleton is clearly "like a
               | global variable" in terms of the data-race, but
               | generalizes to any object in your code.
               | 
               | The commenter I'm replying seems to think that global-
               | variables are the only kind of variable where this
               | problem occurs. He's wrong. All objects and all variables
               | have this problem.
        
         | ekiwi wrote:
         | > Access to any global variable should always occur direct from
         | memory.
         | 
         | What if your function takes a pointer that might be pointing to
         | a global variable? Does that mean that all accesses through a
         | pointer are now excempt from optimization unless the compiler
         | can prove that the pointer will never point to a global
         | variable?
        
           | jqpabc123 wrote:
           | What if your function takes a pointer that might be pointing
           | to an "atomic" variable?
           | 
           | Pointers can be used to circumvent most safety measures. If
           | you obscure the access, you should assume responsibility for
           | the result.
        
           | cbsmith wrote:
           | The great thing about simple memory models is they work
           | really well until you think about it. ;-)
        
         | mumblemumble wrote:
         | This problem isn't specific to global variables; it happens
         | with all shared mutable state. I would assume that the author
         | only used global variables because that lets them keep the
         | working examples as short as possible, and minimize irrelevant
         | details.
         | 
         | And yes, you _can_ put a full memory fence around every access
         | to a variable that is shared across threads. But doing so would
         | just destroy the performance of your program. Compared to using
         | a register, accessing main memory typically takes something on
         | the order of 100 times as long. Given that we 're talking about
         | concerns that are specific to a relatively low-level approach
         | to parallelism, I think it's safe to assume that performance is
         | the whole point, so that would be an unacceptable tradeoff.
        
           | dragontamer wrote:
           | > And yes, you can put a full memory fence around every
           | access to a variable that is shared across threads. But doing
           | so would just destroy the performance of your program. Given
           | that we're talking about concerns that are specific to a
           | relatively low-level approach to parallelism, I think it's
           | safe to assume that performance is the whole point, so that
           | would be an unacceptable tradeoff.
           | 
           | Indeed.
           | 
           | Just a reminder to everyone: your pthreads_mutex_lock() and
           | pthreads_mutex_unlock() functions already contain the
           | appropriate compiler / cache memory barriers in the correct
           | locations.
           | 
           | This "Memory Model" discussion is only for people who want to
           | build faster systems: for people searching for a "better
           | spinlock", or for writing lock-free algorithms / lock-free
           | data structures.
           | 
           | This is the stuff of cutting edge research right now: its a
           | niche subject. Your typical programmer _SHOULD_ just stick a
           | typical pthread_mutex_t onto an otherwise single-threaded
           | data-structure and call it. Locks work. They're not "the
           | best", but "the best" is constantly being researched /
           | developed right now. I'm pretty sure that any new lockfree
           | data-structure with decent performance is pretty much an
           | instant PH.D thesis material.
           | 
           | -----------
           | 
           | Anyway, the reason why "single-threaded data-structure behind
           | a mutex" works is because your data-structure still keeps all
           | of its performance benefits (from sticking to L1 cache, or
           | letting the compiler "manually cache" data to registers when
           | appropriate), and then you only lose performance when
           | associated with the lock() or unlock() calls (which will
           | innately have memory barriers to publish the results)
           | 
           | That's 2 memory barriers (one barrier for lock() and one
           | barrier for unlock()). The thing about lock-free algorithms
           | is that they __might__ get you down to __1__ memory barrier
           | per operation if you're a really, really good programmer. But
           | its not exactly easy. (Or: they might still have 2 memory
           | barriers but the lockfree aspect of "always forward progress"
           | and/or deadlock free might be easier to prove)
           | 
           | Writing a low-performance but otherwise correct lock free
           | algorithm isn't actually that hard. Writing a lock free
           | algorithm that beats your typical mutex + data-structure
           | however, is devilishly hard.
        
             | voidnullnil wrote:
             | > This "Memory Model" discussion is only for people who
             | want to build faster systems: for people searching for a
             | "better spinlock", or for writing lock-free algorithms /
             | lock-free data structures.
             | 
             | Actually, most practitioner code has bugs from their
             | implicit assumptions that shared variable writes are
             | visible or ordered the way they think they are.
        
               | dragontamer wrote:
               | But the practitioner doesn't need to know the memory
               | model (aside from "memory models are complicated").
               | 
               | To solve that problem, the practitioner only needs to
               | know that "mutex.lock()" and "mutex.unlock()" orders
               | reads/writes in a clearly defined manner. If the
               | practitioner is wondering about the difference between
               | load-acquire and load-relaxed, they've probably gone too
               | deep.
        
               | voidnullnil wrote:
               | > To solve that problem, the practitioner only needs to
               | know that "mutex.lock()"
               | 
               | This is true, but they do not know that. If you do not
               | give some kind of substantiation, they will shrug it off
               | and go back to "nah this thing doesn't need a mutex",
               | like with a polling variable (contrived example).
        
               | Kranar wrote:
               | Can you explain what you mean by a "polling variable"
               | needing a mutex? Usually polling is done using atomic
               | instructions instead of a mutex. Are you referring to
               | condition variables?
        
               | mumblemumble wrote:
               | In a lot of code I've seen, there are threads polling
               | some variable without using any sort of special guard.
               | The assumption (based, I assume, on how you really could
               | get away with this back in the days of single-core,
               | single-CPU computers) is that you only need to worry
               | about race conditions when writing to primitive
               | variables, and that simply reading them is always safe.
        
               | Kranar wrote:
               | Okay but the poster mentioned a mutex, which would not be
               | a good way to go about polling a variable in Java. All
               | you need to guarantee synchronization of primitive values
               | in Java is the use of volatile [1]. If you need to
               | compose atomic operations together, then you can use an
               | atomic or a mutex, but it would not occur to me to use a
               | mutex to perform a single atomic read or write on a
               | variable in Java.
               | 
               | [1] https://docs.oracle.com/javase/specs/jls/se8/html/jls
               | -8.html...
        
       | wcarss wrote:
       | The prior article in this series from ~a week ago is 'Hardware
       | Memory Models', at https://research.swtch.com/hwmm, with some hn-
       | discussion here: https://news.ycombinator.com/item?id=27684703
       | 
       | Another somewhat recently posted (but years-old) page with
       | different but related content is 'Memory Models that Underlie
       | Programming Languages': http://canonical.org/~kragen/memory-
       | models/
       | 
       | a few previous hn discussions of that one:
       | 
       | https://news.ycombinator.com/item?id=17099608
       | 
       | https://news.ycombinator.com/item?id=27455509
       | 
       | https://news.ycombinator.com/item?id=13293290
        
       ___________________________________________________________________
       (page generated 2021-07-06 23:00 UTC)