Hacker Newsnew | past | comments | ask | show | jobs | submit | atrust's commentslogin

In a few simple words, can someone explain what does "atomically" mean? I personally used this term when talking about some Redis operations, but never knew the real gist of the word and concepts behind it. I have a very brief understanding of the term and if I'd have to explain it to a person, I'd say it's "the operation that does not have any side effects when performing its unit of work". Is my understanding even close to what atomic operation really is?


If something alters an object such that it goes from state A to state B, it might need to do some work along the way (call it "state C"). Atomicity means that if the operation is interrupted or observed while it's going on, the existence of a "state C" never leaks out. It's always in either state A or B; any third state that might exist along the way is never visible. (Hence what people often say: the operation is either completed or didn't happen at all.)

Renaming a file is a good example. Within the internal structure of the filesystem, you have a directory entry in an old location. That must be removed. You may have another file with the same name in the destination directory. That file must be overwritten. Internally, these things happen by a multi-step process, eg: remove entry for old name, remove pre-existing entry for new name, create new entry for new name. But the system creates the appearance of just 1 step. You don't get file not found while it's overwriting the destination file. You don't ever see the file having both old and new names at the same time.


edit: I just realized you said "renaming." Original comment left below, but I edited before I get downvoted for a classic reading comprehension fail.

Atomicity requires that the leakage mentioned shall not occur from any context aside from its own internal context. That makes your example somewhat of a simplification because these state transitions are visible to other processes. It is a common mistake to try to use files for locking, for example, instead of using the more robust flock(1).


> It is a common mistake to try to use files for locking, for example, instead of using the more robust flock(1).

Why is this a mistake? It is my understanding that, if all the locking you need is a simple mutex, creating a file with a well-defined name with O_CREAT | O_EXCL is atomic -- the file will either be created or not (in which case the call will fail with EEXIST), and no two processes can possibly both succeed at creating the file. This even works on NFS; it was apparently broken in the NFS client in Linux 2.6.5 and below, but it is supposed to work in NFS, and is generally the only reliable way of getting locks in NFS.

You don't get any better way to wait on the lock than re-trying to create the file, and you don't have any mechanism for dealing with clients that die while holding the lock (i.e., it's an aggressively CP system), but for what it does, it's supposed to work correctly and atomically.


The flock() method is preferable when you don't need to use NFS because as you say it'll automatically clean the lock up if the process holding it dies.

This gets rid of all the edge cases with stale locks in one fell swoop.

But as you point out if you want to do this e.g. over NFS you should create a file, but then you need to deal with stale locks.

If you can at all avoid that using flock() is generally better.


http://0pointer.de/blog/projects/locking.html claims that flock() is less reliable over NFS (returns true without actually locking anything on Linux < 2.6.12 and "BSD" - not sure which BSDs or whether that's still true).

And my instinct is that in a networked scenario, you're at least as worried about a machine dying as a process on the machine (i.e. a network partition). A flock()-based lock doesn't clean itself up if the client is unreachable, does it?


Yes as I pointed out you don't want this if you're doing NFS.

Personally I prefer something like a MySQL table with GET_LOCK() to process things instead of NFS if I need multiple machines. It gives you flock() like semantics in that if a machine or client goes away the GET_LOCK() is automatically freed, i.e. it survives as long as the connection to the database survives.

Not having to deal with stale locks generally sucks way less than the extra overhead of a database.

For any NFS-based scenario you usually end up creating a "task" "task.underway" and "task.done" files as locks, and re-enqueuing tasks if you have a "underway" file that's too old without a "done" file.

You'd do the same with a MySQL table that you GET_LOCK() on, except you can safely re-enqueue "underway" tasks if you acquire the lock on them, since you know their consumers have gone away.


Technically, you're right that it's atomic to create a file. But creating a lock using a file can be deceptive and is a common pitfall in my experience. I have seen a lot of shell scripts take this form:

  if [ ! -f $FILE ] ; then
   touch $FILE
   # do something dangerous, assuming I have a lock
   rm $FILE
  fi
The problem here is, of course, that I've checked whether the file exists, but another process (even a concurrent execution of the same script) could remove $FILE after I've checked that it doesn't exist. Now I (or any other process) can happily proceed to create $FILE, thinking that no one else is executing simultaneously. Actually, if I ran two executions of this script at about the same time, they could both pass this check and executed the (mistakenly expectedly) "synchronized" block.

Of course, you don't have to use flock(1) to make this operation atomic. It just handles a lot of the extra work that I don't want to have to think about, even if I did set `noclobber` or something like that.


Nonetheless the reply prompted me to look into the rename case specifically. Apparently on Linux, the replacement of the destination file is atomic (as many of us already knew and take for granted), but there's no guarantee that you won't see both old and new names in flight for a brief moment in time (like the last sentence of my comment).

Would not be surprised if all bets are off once you get an NFS mount involved.

As always it's a tradeoff between useful behaviors and the cost of synchronizing.


Not really. It just means that it's indivisible (the original meaning of "atom"). Either it succeeds or fails, you never have to worry about it being half-finished. This includes actions which are so small they are literally indivisible, or actions which roll back to the original state if they fail.


Not just about it only completing or failing, but another observer in the system should never be able to find it in the half-way state. To everyone but the implementer of the atomic operation, it has no half-way states.


In the database terminology that's usually called "Isolation" to keep the concept separate from the rather restricted definition of atomicity.


Got it. Now it makes more sense to me. Now I know people tend to talk about atomicity when it comes to low-level-ish things. But say I create some sort of a web service with a bunch of business logic. Does it worth to follow this principle in that case? For instance, client sends an API request (let's say "Add user to friends"), is it even possible to apply atomicity for these type of things?

Edit: Thanks everyone for taking time to explain it to me.


Atomicity can be important at any level. For example, assume that adding a friend involves two edges in a graph, one edge in each direction. Now assume that there is some other piece of code that does some analysis or processing of friend relationships. This piece of code might rely on there always being two edges, and crash or give erroneous results if not.

Thus, the adding of the two edges must be atomic (when observed from the rest of the system).

(The example is a bit contrived, but hopefully gets the idea across.)


Yes, depending on how you store the relationships, its possible that when a adds b to its list of friends, a could be added to b's list of 'friendofs'. If they're stored separately, the edit may not be atomic. If it's stored in a single place (queryable from either direction), it's generally going to be atomic, unless you're doing something way outside the norm.


Yes. To take a larger example action, creating a user could fail to be atomic if, say, it stored the username in a separate table from the user object, wrote the username first, then referenced it from the user object, but didn't roll back the username insert if the user object insert failed.

Likewise, for the seemingly simpler example if establishing a friend relationship, you may be tracking that relationship in both directions, in which case one could fail and the other succeed.


The business logic of your web application should resolve around database calls. Popular databases should already guarantee these atomicity properties for you through transactions.


> The business logic of your web application should resolve around database calls

Might, not should. All web apps are not just front ends to a single database where transactions are useful and once you leave the realm of a single database into a more distributed type system then transactions are no longer an option.


Database guarantees are great until your data doesn't fit in a single database; it's good to examine what your database is providing for you, and to contemplate the costs for providing it in the database level.


Yes. But in this case, the concept you are probably looking for is "transactional". Transactions are atomic, but the difference is that they can fail (the state is then rolled back), and they can be retried at a later point.


Here are some real-world examples of what can go wrong if your API requests aren't atomic http://josipfranjkovic.blogspot.jp/2015/04/race-conditions-o...


(A bit late, but hoping that you see this:)

If possible you should probably go for an even stronger property, namely Idempotence[1]. (This can be relatively easy if you can force clients to provide some sort of unique token for every operation.)

It's usually makes this even easier to reason about for clients since they can just retry anything while knowing that it doesn't matter if they retry an already "applied" operation.

[1] https://en.wikipedia.org/wiki/Idempotence


Reminds me of graphics double buffering. Back in the days games would write directly in the video buffer, while the graphic chip would scan that same buffer and push the content on screen at the same time.

If your code is too slow (a complex effect, too many character at that point), you might not be done writing a full frame when the graphic chips starts to output the pixels.

This means your TV is now showing partly old and new state. Nothing important most of the time, it's only games, it's only a few ms of absurd information, people's brain can compensate. It is ugly to see though. You have that weird 'line' somewhere below.

Since people changed the structure a bit, with two (or more) buffers, the program computes the new image in one buffer B, while the chip shows another buffer A. When you are done with a picture, the chip will now scan B, while you can write in A. This means the output never shows partial frame anymore.


Translating in English from Italian Wikipedia: From ancient Greek ἄτομος - àtomos - [indivisible], made of ἄ - a - [Privative alpha] + τέμνειν - témnein - [cut].

Personally, I struggled long time before fully understanding its use in IT, because I learned programming after sub-nuclear physics, thus I had a hard time conciliating the huge atom (a million of billions of times bigger than a nucleus) with the concept of "cannot be split" :-)


It is helpful to understand what problem it solves.

Lets say that we have a banking application that consists of a program which updates someones bank account by $Y every time it is called. Y is the command line parameter. The program's algorithm is like this :

1. Read the current balance amount to X

2. Add Y to X and store it in Z

3. Write Z to the database.

This program cannot be called by multiple processes at the same time. Lets say that it is payday, the account holder holds two jobs and each employer is trying to deposit $10 into someone's account, at the same time. Both these processes call the program with Y = $10. What happens ?

1. Process 1 reads the current balance ( $100 ) to X

2. Now, process 2 reads the current balance ( $100 ) to X

3. Process 1 adds 10 to X ( Z = 110 )

4. Process 2 adds 10 to X ( Z = 110 )

5. Process 1 writes the updated value to the database ( Z = 110 )

6. Process 2 writes the updated value to the database ( Z = 110 )

Now the account reflects a balance of $110, when it should have reflected $120. What we need is a guarantee from the system that some actions will not be parallelized ( i.e, they will be atomic ). From TFA it is given that "mkdir" is an atomic operation in UNIX ( i.e, only one process can create a directory at the same time ). You can write the program with the following logic

1. mkdir /tmp/lock_dir

2. If above step was unsuccessful sleep 10 seconds and go back to step 1

3. Read current account balance to X

4. Add Y to X and store it in Z

5. Write Z to database

6. Remove /tmp/lock_dir

Multiple processes can invoke this program simultaneously.


It means that the operation can't be divided any smaller--that it is impossible to catch it only part-way completed; it either hasn't happened yet, or has completely happened.


> I'd say it's "the operation that does not have any side effects when performing its unit of work". Is my understanding even close to what atomic operation really is?

No. It can has as many side effects as it wants. Atomicity means: when going from state 1 to state 2, no matter how complex the transition, there are no externally observable intermediate states.


As the others have said, indivisible operations. This is important in the context of race conditions, imagine two threads incrementing a counter with no locks. Using non-atomic ops to read, increment and then store the number will lead to a bad time. Or more on topic, checking if a file exists and then trying to open it - lots of bad things can happen.


Why one would like to have an atomic operation is easier to understand.

For example, one can use the atomic nature of creating a symbolic link on nix to create a lock file to prevent a race condition in a forking shell script. Say you have two or more processes wanting to do something that can (or should) only be done by one process at a time; one naive solution is to manage access of each process to said action by using a lock file. However, writing or touching a file itself is not atomic.

The answer is to throw a symbolic link into the mix. In this scenario, the lock file already exists. However, the lock is not the file itself, but a symbolic link to the file. The protocol for each process to follow is:

1. try to create a symbolic link to lock file (any file really)

2. if successful, proceed; if failed, wait (or exit)

3. when process is done, delete symbolic link to lock file

Simply checking for the existence of the symlink is not sufficient since there is a period of time between checking for the symlink (or file) and proceeding with said action where another process can think it has the lock.

The OS ensures that one and only one symlink (of the same name) can exist; attempts to create it again (even simultaneously) will result in a failure of one process to create the symlink. There is one winner; all others are losers. This is to say, the kernel ensures that the operation is atomic. As a result, the OS is now arbitrating what process can proceed to action, at the very lowest level. Another way to think about it is that it provides a way to make competing processes serialize - or get in line so that they may complete their action one at a time.

In my experience, it is important to experiment and test to make sure that the atomic primitive you're using is actually working as expected. I've run up against some inconsistent implementations of symlink creation that make this action not as straightforward to use as one is lead to believe.


Atomic means "Does precisely what is is asked to do, or does nothing". This is the core of synchronization primitives, preventing two tasks from doing the same work or trying to access the same resource at once. open(O_CREAT| O_EXCL) for example, open a new file for you, or report an error. Regardless of who else is attempting operations at that moment, at most one of those calls will succeed (they can all fail, for a number of reasons). This allows you to, say, create a lockfile (often named filename.lock or .filename.lock) that serializes access to a shared resource, like a simple database or a printer control port or network connection. This is core to multithreaded programming (At least, imperative multithreaded programming; Functional languages tend to abstract this away by having little-or-no shared state)


It's more about multiple operations taking effect "instantaneously" from the perspective of some observer, and sometimes it also implies that either all or none of the operations take effect, but never just some of the operations.


I googled for "atomic operation" and this was top result:

https://en.wikipedia.org/wiki/Linearizability


It's obvious that the author doesn't know all the details and you keep asking all of these questions over and over again, making yourself look like a troll.

I have a feeling that the _magical_ Peruvian cure is based on eating blister beetles, which is a popular treatment in Chinese medicine. Apparently, chemicals these bugs make could be beneficial in treating certain kinds of cancer. There are a few researches going on on that front.

http://www.ncbi.nlm.nih.gov/pubmed/24733674

http://www.academia.edu/1279793/Antitumor_Activity_of_Mylabr...


I'd love to see an option to see the original design of all the mails.


I loved Perl 5. Now when Perl 6 is out, I'm a bit confused about versioning. As far as I understand, perl 6 files are of .pl6 extension. Why would that be different from regular .pl, giving that it's just a new version of a language? I mean, I have never seen things like .js5, .php6, etc. Next, is `use v6;` mandatory? Quite confusing. Am I allowed to do `use v2;`?


Your confusion lies in thinking, as you've said, "just a new version of a language." It's actually been a long, 15-year journey that resulted in a completely brand new language. The name is still 'Perl 6' for historical reasons, but it may as well be called 'Rakudo'; something people've not heard of yet. The Perl 5 language is still being developed and improved, with a major release every year. Hence, the filename distinction.

However, it's just a convention. You can have any extension you like (at least on *nix).

And to answer your question, no "use v6" is not mandatory. It's there so if you run your script with a perl (and not perl6) interpreter, you won't get confusing errors.


Of course one can have any extension. Not that it's a bad thing to use .p6. I was just trying to make it clear for myself to understand how beneficial the new extension is. Part of that lies in "marketing" the language. Old school engineers may still want to use .pl, whereas newbies will follow the documentation and have their files prefixed with .p6. My worry is that such behavior may lead to a confusion in Open Source world and push away newcomers from using the language.

I've been following the journey for, more or less, the past 10 years. I hope that whoever spent those 15 years on developing Perl 6, will reap the rewards.


Most of the modules I have seen are using .pm6 and .pl6.



The extension is not mandatory. Usually people use .p6, but you can also use .pl . Notice that I've never seen .pl6.

'use v6;' is not mandatory either, but it allows your program to display an helpful error message if you try to run it with the perl 5 interpreter. Try "perl -e 'use v6;'", you should get:

    Perl v6.0.0 required--this is only v5.20.2, stopped at -e line 1.
    BEGIN failed--compilation aborted at -e line 1.
'use v2;' will not run with the Perl 6 interpreter, but 'use v5;' will work, as the Perl 6 interpreter is specced to be able to run Perl 5, albeit with certain limitations (no XS, for instance).


Notice that I've never seen .pl6

Check your Rakudo's tools folder.


Understood. Thanks! I might have confused .p6 with .pl6. Thanks for the clarification on that.


`use v6;` actually means "use the latest version of Perl 6". I believe you can even do `use v6.b` and other versions to target specific releases of Perl 6, though I'm not sure if that's implemented yet.

Some text editors also will change to a Perl 6 highlighter if this pragma is present.


I have never seen things like .js5, .php6, etc.

Some Fortran people do it.


I wonder, how many people shared it on Facebook?


what is your most upvoted comment?



Well. I'm not kidding at all. I read a bunch of reviews as for the Elance/Odesk/Upwork/Etc. Most of what I heard is "stay away from them if you are looking for something really good". I also found a couple companies that do (as per reviews) a relatively good job. But the prices are crazy ($300-500 per article). Some other companies say "30$ for 1000 words article". So I was a bit confused about all of these costs.


What do you mean by "really good"? Spelt right and mildly engaging, or the next EB White's "This is New York"?

First you'll find on Upwork or Textbroker no problem. Latter will be a long search for a great writer who creates exactly the work you want.

Writing is like most other unregulated professional services in that costs vary wildly, as does quality, and the two are not necessarily correlated.

Get samples, ask friends for recommendations, find blog posts you like and e-mail the authors, give small easily completed sample tasks that grow into big assignments and build a solid relationship with one vendor over time.


Elance and Odesk merged to become Upwork.

What you heard about those sites is incorrect-- or, more precisely, incomplete. It's like my saying "A lot of the people on GitHub couldn't build a program to display 'Hello World!'"

Many of the writers on Upwork who bid on jobs are worthless-- people who have trouble writing a coherent sentence. Others are fronts for offshore content mills.

You will also find (1) copywriters who got thrown out of work when their ad/PR agency folded, (2) reporters who have been laid off as their papers shrunk, (3) staff writers for magazines who quit to have kids (or, if they're men) raise a child, (4) retirees who need cash, (5) people with disabilities who need to work from home or (6) someone working 40 hours, but looking to moonlight to make extra cash.

Unlike (say) database administration, where you can't judge competence from a resume, you can evaluate a writer's ability in 30 seconds-- ask for a writing submission and read what they write.

The $300-500 places price for one job-- that assumes they do it, you go away and they never see you again. Part of the cost is the research time they need to become knowledgeable. For people who can provide a steady income stream, they charge a lot less.

I think it's simpler to find one of the many people who need work-- and would kill for a chance to make $100-$300 a piece-- but it's up to you.


Well done. This one reminds me http://helloracer.com/webgl/.


I'm not considering myself an expert, but the front page doesn't look interesting to me. Maybe in your case it would make more sense to learn from front pages, similar to Expedia/Kayak. At the very least, A/B test it.


I use Low Power Mode all the time since the iOS 9 update. It does work well.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: