[HN Gopher] Using the Linux kernel's Case-insensitive feature in... ___________________________________________________________________ Using the Linux kernel's Case-insensitive feature in Ext4 Author : mfilion Score : 50 points Date : 2020-08-29 11:56 UTC (1 days ago) (HTM) web link (www.collabora.com) (TXT) w3m dump (www.collabora.com) | muststopmyths wrote: | Until I read the comments to TFA I had no idea how passionately | people get their knickers in a twist over case-insensitivity of | filenames. | | Very interesting feature, of course. | MayeulC wrote: | That comment section is pretty interesting to read... | | I don't really have a use case for that feature right now. But | that doesn't mean others don't have a use-case for this | feature. | | As the comment section reads, most feel entitled to NOT HAVE | this convenience feature implemented for others, because they | do not need it. It feels like a cognitive bias, but does it | have a name, besides "get off my lawn?" | bxparks wrote: | I definitely prefer case-senstive. I didn't realize that MacOS | had switched to case-INsensitve at some point. So the following | drove me crazy for several minutes: | | $ ls -l /usr/local/bin/virtualbox | | -rwxr-xr-x 1 root wheel 77 Oct 2 2015 | /usr/local/bin/virtualbox* | | $ ls /usr/local/bin | grep virtualbox | | <nothing, WTF??> | | $ ls /usr/local/bin | grep -i virtualbox | | VirtualBox* | | My coworker uses MacOS, I use Linux. Several times, changing | the case of a directory or file caused the MacOS to mess up the | local git repo so badly that we had to blow it away and refetch | from remote. | | At least the Mac Finder allows changing the case of a filename. | Windows File Explorer simply refuses. I change the case of the | file name, add a random character, save the file name. Then | edit the file name again, remove the random character, save the | file name. It's needlessly annoying. | | I think case-INsensitive makes more sense for software | developers, since there are many instances where things like | "string.h" is _not_ the same as "String.h". But for normal | people, case-INsensitve may be more useful. However, even for | normal users, I think there are situations where "bob.html" (a | noun) is different from "Bob.html" (a person). | zerocrates wrote: | Mac has been (at least by default) case-insensitive for a | very long time, back to the original HFS at least, and | probably its predecessor MFS too, though I'm not sure on that | one. | bxparks wrote: | I thought that I had used Macs with HFS+ filesystem that | was case-sensitive a long time ago, but I'm not 100% sure | anymore. | muststopmyths wrote: | >At least the Mac Finder allows changing the case of a | filename. Windows File Explorer simply refuses | | Just tested on Windows 10 and this is definitely not true. I | could have sworn you could always do it, but my memory of | Windows < 10 is foggy. And you can also do the renaming from | any console app (cmd, powershell, etc), FYI. | | NTFS is case-preserving, but case-insensitive. That, IMO is a | good balance. When they were first making NT with the posix | subsystem, I suppose they found that the number of cases | where two files that only differed in case lived in the same | directory were small enough to not matter. | | I personally don't think case-sensitivity in file names adds | anything useful. | bxparks wrote: | I swear that my Windows 10 File Explorer had this problem. | | Here is a user complaint about this problem from Nov 2019: | https://answers.microsoft.com/en- | us/windows/forum/all/cant-r... | | Here's another post from May 2018, explicitly mentioning | Windows 10: https://answers.microsoft.com/en- | us/windows/forum/all/why-ca... | | But I checked again on my own Windows 10 machine (Windows | 10 Pro, Version 2004, build 19041.450) and holy shit, it | now works. It must have been fixed in a recent Window 10 | update. | tiraniddo wrote: | If you enabled the posix subsystem NTFS became case | sensitive as well, although most API passed a flag to | disable that for Win32 file calls. | | It's interesting that Microsoft effectively added the | reverse feature to NTFS [1] to support per-directory case | sensitive files to be more compatible with Linux. | | [1] https://www.tiraniddo.dev/2019/02/ntfs-case- | sensitivity-on-w... | tjpnz wrote: | Because it can be rather frustrating to deal with problems | owing to case insensitivity on systems that allow it. | colejohnson66 wrote: | How? If your program asks the OS for a handle for "data.db" | and it returns one for "Data.db", _does it matter_? If your | program asked for "data.db", why should your program care | that the only match was for "Data.db", so it got that? | [deleted] | gpvos wrote: | Interesting in the sense of "May you live in interesting | times", yes. | | It introduces near-endless complications, especially in the | face of Unicode, and one should think thoroughly about it | before implementing it, as you can read in Torvalds' initial | reaction linked elsewhere in this discussion. But the way they | did it seems relatively sane to me, with an easy way to | compartmentalize it to, e.g., a Steam directory. | mixmastamyk wrote: | As someone who grew up on Commodore, DOS/Win, and later Unix, I | just never had a problem with case sensitivity or lack of. | _shrug_ | | 99% of the time I use lower-case only filenames, as they are | easier to read. The times I don't shell completion and/or GUI | selection obviate the need to care anyway. Given the significant | complexity of Unicode I'm not sure insensitive is the way of the | future. | | There was a window of time where insensitive made the most sense, | the time of DOS. 8-bit per character filenames, with a very | primitive CLI shell, ie. no assistance. Now? In the days when a | majority of users don't even see filenames? Meh. | akdor1154 wrote: | What are the benefits to doing this? From the article I got two, | "wine can get the kernel to do case insensitive path stuff | instead of emulating it itself", and "users don't need special | userspace magic to treat their filenames as strings, not bytes". | | I have never seen anyone at all mope over lack of the latter, and | the former seems quite specific to get such a big feature landed | over. What other use-cases are there that benefit from this being | in the FS? | Sesse__ wrote: | Samba is another case, and a very important one at that for a | lot of users. | MayeulC wrote: | I think it might somewhat be required for interoperability with | some other filesystems (exfat under some conditions, hpfs | likewise). | | Moreover, as complex as the problem can be, it might come in | handy to have a unicode normalization subsystem in the kernel | for other modules to use. I don't think it's in the FS, but | it's now a feature of the kernel that various FS can choose to | wire up with, and expose knobs to the user to leverage it. | | Besides that, whether they do or not is their choice. But I | believe this subsystem is going to get used a lot more than | some other, obscure kernel areas ;) | MayeulC wrote: | "edit": s/hpfs/apfs/ | myself248 wrote: | I mope at the lack of the latter, and it kept me off linux for | twentysome years. Only now that I do most of my work in a GUI | and I just click on filenames rather than typing them, does a | case-sensitive filesystem not grind my gears. Every time I have | to cd Downloads instead of cd downloads, I wonder who thought | that was a good idea. | | Case-sensitivity is a classic case of users being forced to | comply with the computer's needs, rather than the other way | around. I contend that that is Wrong, period. | | To put it another way: Unicode is the opposite. It is computers | adopting complexity to serve the needs of humans. If we can do | unicode, we can do case-insensitive filename matching. If we're | going to insist on case-sensitivity and ignore the needs of | human language, we should just go back to plain old ASCII and | force the humans to comply with that too. | mixmastamyk wrote: | Try setting the insensitive option in your shell. | | With hidden filenames on mobile, desktop GUIs, terminal CLI | completion, and shells with insensitive matching like fish | (bash via option), the benefits of insensitive fs are not as | high as they used to be. | | Meanwhile, the complexity of matching modern Unicode causes | performance degradation and exposes many edge cases. | | In short, the window of time where insensitive made sense has | largely closed. | [deleted] | qalmakka wrote: | Does case folding in the kernel pass the Turkey test? I.e., are | different locales taken into account in order for the correct | string to be matched? As far as I remember this was a big deal | for supporting Unicode on case insensitive filesystems, because | it than means that a file stops existing depending on the current | locale. | | For instance, take a file named "ivory.txt": `stat("IVORY.TXT")` | on a case-insensitive filesystem would succeed if the locale is | en_US but fail on tr_TR due to the uppercase version of 'i' being | 'I' there instead. | tzs wrote: | Almost 20 years ago, the place I worked decided it wanted to make | something like Wine, except going the other way--it would run | Linux binaries on Windows. | | We got it working fairly well. We expected that we'd have to put | in some kind of hack to deal with filename case, and I think we | eventually did. | | But before that, when it was still just passing Linux case- | sensitive filenames through to the case-insensitive Windows | filesystem, I tried installing most of whatever was the current | release of Red Hat at the time. | | It almost all worked fine. The only thing I remember being a | problem was that some things, such as some Perl modules from | CPAN, had both "makefile" and "Makefile" in the same directory. | colejohnson66 wrote: | Curious: Cygwin's been available since 1995. Why'd your company | go about their own way? | hajile wrote: | Linux systems are more than just a kernel. Does every program | across the system now understand that direct comparison of | strings doesn't work on filenames? Does every regex touching or | analyzing filenames know that they must now be case insensitive? | How long until people stop getting bitten by this issue? | | That aside, inferring semantic meaning based on a few cultures at | this moment in time is a dumb choice. There are tons of such | language-specific semantics that could be implemented and the | result is needing to memorize all those rules about when it does | and doesn't actually matter (and arguing that only certain | cultures and languages should have their own semantics encoded in | the OS is its own problem). | james412 wrote: | This was discussed to death on the kernel mailing lists, you | should go read them. | | The principal question is whether the tool is more important, | or the end user. Why did I pay for this machine if it weren't | intended to facilitate me? That's the bottom line with most of | these kinds of technical "correctness" arguments | | And as for whether userspace should catch up, thanks to OS X | for the most part that already happened a long time ago for a | ton of open source packages | igetspam wrote: | Which has caused countless problems with interpreted | languages and cross platform functionality. Things lime ruby | in OSX will gladly less you mangle your include strings on | OSX, which causes a "works fine in Dev" problem. I'm not a | fan of case insensitive filesystems because I have to manage | services. | akira2501 wrote: | > That's the bottom line with most of these kinds of | technical "correctness" arguments | | The problem is emergent behavior. We can create any number of | features, but we have a really hard time testing all the | available configurations that result. Engineers rely on | simplicity as a way of warding off this particular problem, | because the other bottom line is people don't want to pay for | a system that loses or invalidates the work they've put into | it. | dietr1ch wrote: | It's not enough to have similar matching at the fs level, | applications need to have the same functionality around, | otherwise anything that indexes the fs will have false | negatives before reading the files. | | Now, if this needs to be taken care of at the application | level, then why have this misfeature? It'd be better to have | a good library for matching this that could be aware of the | language and locale (or maybe multiple lang,locale pairs) | instead of throwing this feature into the fs and call it a | day. | | Also, if some applications benefit from having insensitive | matching, like things built to run on Windows/Mac, then | having a wrapper that fixed the fs access with this matching | library would be enough, no need to force other applications | to use insensitive matching because a single one needs it. | asveikau wrote: | > Why did I pay for this machine if it weren't intended to | facilitate me? | | I happen to agree with the idea that the filename should be a | dumb blob of bytes and the kernel should not do case folding, | as it is the wrong layer for that, eg. the user can change | their language but it won't update what has been written to | the disk in thousands or millions of places where you could | suddenly have a filename collision somewhere based on those | rules changing. | | But, I do hope you get that refund for your Linux. | kochthesecond wrote: | > dumb blob of bytes | | Well, now your filename is invalid utf8. How should | programs display it or even address such a file? | jcelerier wrote: | > How should programs display it | | what's wrong with foo.txt | | > or even address such a file? ... by using the array of | bytes ? | ygra wrote: | It's ambiguous, for example. | jcelerier wrote: | so are a file named Hello.txt and another one named | Nello.txt | colejohnson66 wrote: | The fact that if one has two files, say "test{invalid | bytes}.txt" and test{other invalid bytes}.txt", both have | replacement characters inserted at the same spot and | would decode to the same codepoints. | asveikau wrote: | How does the UI framework act when you set a label to | such payload? How does your web browser act when it sees | it in HTML? I have found working on apps that see a lot | of usage in varied markets that as much as we wish to see | the best and ideal conditions, malformed utf-8 surfaces | in the real world pretty often. | msla wrote: | > Well, now your filename is invalid utf8. | | That's reality. An OS which can't keep up with reality is | broken. | james412 wrote: | > filename should be a dumb blob of bytes | | This hasn't been true since the days of CP/M | asguy wrote: | For e.g. the Linux kernel, besides path separator(s), why | do you think that? | | All of the wide/special-case manipulation when writing | code on e.g. Windows drove me nuts. | ChrisSD wrote: | Out of interest, what special-case manipulation? I | generally treat file paths as opaque `\\` separated | strings (or even as a single blob if I don't need to | parse it). I'm uncertain why I'd want to treat them | specially. I'll leave that to the OS. | cheerlessbog wrote: | I understand that NTFS has its own case folding table which | is written once when the volume is formatted. This does | seem to have stood the test of time and enormous usage so | maybe it is not such a poor idea. | asveikau wrote: | That doesn't sound great if somebody formats your USB | stick in Turkey and suddenly speakers of western European | languages can observe 'i' as case sensitive. | est31 wrote: | That's IIRC how it used to be treated on ext based file | systems until now. Everything allowed except for / and NUL | bytes. | azalemeth wrote: | Arguably the biggest difference that I notice when using | linux as a desktop environment as an end-user is that it | trusts "you", the [sometimes root] user, to a far greater | extent than other operating systems. It is for this reason | that I enjoy it. | | It also means that if you want to do something highly | annoying and unusual, you can - and arguably case-insensitive | filesystems are a subset of that. | snazz wrote: | I agree that case-insensitive filesystems should be an | esoteric feature, but given that they're the default on | Windows and macOS, it should definitely be a well-supported | option on Linux for the sake of compatibility. | magicalhippo wrote: | I've set the Samba shares on my NAS to be case-sensitive, | as making them case-insensitive slows down directory | access by orders of magnitude. | | I've been running this for years accessing them both from | Linux desktops and Windows desktops, and only once have I | had an issue that required me to manually rename | something on the NAS. | | This makes sense as most applications don't care about | the filename, and will just use what you supply, or | generate one and use that string all over. | jra_samba wrote: | Yep, that's true. It's the cache misses that kill | performance. If the client asks for file "Foo", and the | (l)stat fails to find it, then we have to scan the whole | directory looking for any case-differing versions of | "foo" "FOO" "fOo" etc. | | Very costly, but the only way to give case-insensitivity. | jstimpfle wrote: | Could you provide pointers to these discussions? As a | technical person, case sensitive filesystems have been a loss | both from a programmer's and also from an end user's | perspective. | ekr wrote: | Here's Torvalds' view on the matter: | https://lwn.net/ml/linux- | fsdevel/CAHk-=wg2JvjXfdZ8K5Tv3vm6+b... | | I also side with this view, namely that this is something | that would be better placed in the userspace rather than | the kernel, which really doesn't need more complexity for | things that add so little value (negative value to some). | agwa wrote: | He must have changed his view because he ultimately | allowed the feature. Does anyone know what changed his | mind? | dm319 wrote: | While Linus kicks off about things, he doesn't tend to | outright refuse things. I don't think he sees himself as | the gatekeeper of the kernel and that is evident in the | way the kernel developed right from the beginning. That | has attracted criticism from the likes of Ken Thompson | who thought that too much crappy code was allowed into | linux. | | |I've looked at the source and there are pieces that are | good and pieces that are not. A whole bunch of random | people have contributed to this source, and the quality | varies drastically | ploxiln wrote: | This feature doesn't facilitate you at all. It's a historical | mistake that macOS and Windows have preserved, and refined a | bit over the years. | | The only real purpose of this is for easier/faster | compatibility with software developed and tested on macOS and | Windows, which has accidental case inconsistency in file name | references in the code, which happens to work fine on macOS | and Windows. | | TFA could have made that argument, but it didn't, it made | incorrect arguments instead. For example, a user might type | in a lower-case name for a file one time, and a capitalized | name for a file another time, and intend to access the same | file. But what user is typing in a whole filename the second | time? They're picking it from a list, or if they're very | advanced, using completion in a terminal. TFA also mentions | non-English languages, in the context of unicode | normalization ... but non-western-european languages won't be | handled correctly by any universal case-folding algorithm | anyway, with Turkish being the most common example. | | The whole strategy just doesn't work out well. Many low-level | filesystem developers have known this for over 20 years. It | doesn't work better for anyone, but non-technical people just | aren't aware of why or how it increases complexity and costs, | and reduces performance and reliability. | efdee wrote: | Surely the mistake was case sensitive filenames. I can't | think of a single end-user use case where this behavior is | desirable. | arghwhat wrote: | In a world of ASCII, maybe. But fixing one problem for a | small group of people is a giant can of worms for the | rest of the world. Normalization of compound characters, | exotic character sets, emoji, different classes of | upper/lower-case letters, normalization of compound what- | not. | | And even then it still doesn't fix the issue outside of | the world of ASCII. A filename written in hiragana and | katakana is logically the same to the end-user, but they | are still distinct. Simplified and traditional Chinese, | Hangul and romaja, pinyin, devanagari, thai, and the list | goes on. | | Case-insensitive filenames fixes nothing, but breaks | everything. There is only one sensible thing to do with | arbitrary user-input, and that is to leave it be. | tuatoru wrote: | > There is only one sensible thing to do with arbitrary | user-input, and that is to leave it be. | | I wish programmers would believe this about names and | addresses! My wife has a two-word first name, and I have | a two-word family name. | msla wrote: | Ensuring filenames don't get destroyed by an OS that | refuses to understand a given language. Case is a | complicated mess once you leave ASCII, and that's | partially because ASCII is lying to you about how English | case works: Yes, the English language has title case, and | ASCII conflating it with upper-case does not negate that. | | Move on to most anywhere else and the notion that it's | fast, reliable, and safe to convert case gets lost in the | realities of human writing systems. | alerighi wrote: | I disagree. The error is to consider paths as a high level | information, that the user has to know about, rather of what | they really are, a low level information, that potentially | the user never sees (for example consider mobile operating | systems like Android/iOS). | | In practice the case insensitive thing if we want to call it | should be implemented more high level, in the file manager, | rather than in the filesystem itself. That is even what newer | versions of Windows/NTFS do! Recent versions of NTFS are in | fact case sensitive, and if you mount a NTFS volume with | Linux in fact you can create two file with the name differing | only by the case: the whole case-insensitive thing is handled | at an higher level in the Windows APIs. | tinus_hn wrote: | Does every program understand filenames with newlines? Or | emoji, backslashes, minus signs or invalid UTF-8 sequences? | pessimizer wrote: | Why would "every program" be the threshhold? There is no | internal file format that every program understands, why | should every program be able to deal with every filename? I | could write my program to chunk filenames into words, and | choke on any that it can't find in your local dictionary: | "Spelling-sensitivity." It would be very bad to put that into | the kernel. | | It's easier to keep things simple, and rely on the program to | discard things that it can't or doesn't want to use. If a | program wants to insist that "Bob.txt" and Robert.txt" are | the same file, that's its prerogative, although it's a | dangerous assumption. | m463 wrote: | It took me a long time to figure out why "ls" on one system | sorted my filenames differently than "ls" on my other system. | | (meaning that filename handling with the same program isn't | even consistent with itself) | | Also, what happens when you backup very-important-file.txt | and VeryImportantFile.txt and then restore to a case- | insensitive filesystem? I'll tell you - unexpected things. | oarsinsync wrote: | > Also, what happens when you backup very-important- | file.txt and VeryImportantFile.txt and then restore to a | case-insensitive filesystem? | | Both files get restored as independent files, as case isnt | the only difference in those files. | | (Did you mean very-important-file.txt and Very-Important- | File.txt?) | jasoneckert wrote: | Case sensitivity is actually a powerful feature of Unix systems | as it allows for multiple valid string variants of a single word. | | For example, common Unix convention in the 1980s-1990s was to | name user-created directories with a capital to make them easier | to see in a regular directory listing without color (e.g. Poems | is a directory, while poems is just a file). | | I've used it heavily with content revision (e.g. CATHENA02.yaml | is the second cathena configuration file in testing while | cathena02.yaml is the production version of it). | | Plus, making a filename case-insensitive for processing purposes | in a scripting language is very easy. | | Consequently, I can't imagine a reason why I'd use case- | insensitivity in ext4. | kasabali wrote: | > The case insensitivity is just a horribly bad idea, and Applie | could have pushed fixing it. They didn't. Instead, they doubled | down on a bad idea, and actively extended it - very very badly - | to unicode. And it's not even UTF-8, it's UCS2 I think. | | > There's some excuse for case insensitivity in a legacy model | ("We didn't know better"). But people who think unicode | equivalency comparisons are a good idea in a filesystem shouldn't | be allowed to play in that space. Give them some paste, and let | them sit in a corner eating it. They'll be happy, and they won't | be messing up your system. | | Linus Torvalds, 2014, | https://web.archive.org/web/20150112214037/https://plus.goog... | colejohnson66 wrote: | The problem with case sensitivity is UX. Sure, one can explain | to someone that "A" and "a" are different to a computer, but | will they _understand_? | ornxka wrote: | They're different to a human, why would it be any more | difficult to understand that they're different for computers | too? Trying to make things easier to understand by | introducing complicated mechanisms just makes the whole thing | harder to use. | klodolph wrote: | The computer should be more complicated to make things | easier for humans. | | "Report (final).docx" vs "Report (Final).docx" | kevincox wrote: | It gets even worse when you consider that you can encode | visually identical characters with different combinations of | unicode codepoints. | | But that being said, are many nontechnical users typing | filenames anyways? | naniwaduni wrote: | Yes, and if they don't, introduce them to a 5-year-old. | | That doesn't mean they won't still have trouble working with | filenames that differ only by case, and it can still be | pretty hard to explain why two characters that are visually | similar are distinct to the computer. | | But the idea that letters that look different are different | is not a hard one to understand. It's the default, and one | that people have to actively unlearn. | callesgg wrote: | Yes, my experience is that people understand well enough. ___________________________________________________________________ (page generated 2020-08-30 23:01 UTC)