9

I've got a comma-separated file that looks like this:

100,00869184 6492,8361 1234,31 200,04071 

I want to use sort to sort this file numerically by the first column only.

Desired Result:

100,00869184 200,04071 1234,31 6492,8361 

How do I achieve this using sort? It seems like the commas are being treated like thousands separators instead of delimiters even when I call them out as such.

Both sort -t',' -n and sort -t',' -nk1' give me this:

1234,31 200,04071 6492,8361 100,00869184 

Sorting by the default (no parameters) or using sort -t',' gives me this:

100,00869184 1234,31 200,04071 6492,8361 

And sorting as a number sort -n gives me this:

1234,31 200,04071 6492,8361 100,00869184 

How can I use sort to achieve my desired result?

Edited to add: This is for a one-time operation to create a sorted list of about 7 million lines, so workarounds or other unorthodox methods are perfectly acceptable.

3
  • the examples I see here seem to show the -t option as having a space between the -t and the character Commented Aug 27, 2012 at 16:28
  • First thought - use cut. It selects only a particular column, based on a given separator. Also "Artem Ice"'s answer with tr. I love tr. I am too lazy to write and test this, though. Cheers! Commented Sep 4, 2012 at 11:59
  • Possible same for tab char: stackoverflow.com/questions/1037365/… Commented Apr 1, 2015 at 23:23

4 Answers 4

8

This is certainly a dirty workaround, but I figured out a way to do this thanks to @slhck's tip about locales. If a better answer comes along that would be more helpful to others, I'll certainly accept it since this pretty much only works for my specific problem.

I set the locale to Spanish (Bolivian) so that the commas were treated like decimal points, then standard numeric sorting did the trick.

$ export LC_NUMERIC="es_BO.utf8" $ cat test.csv 100,00869184 6492,8361 1234,31 200,04071 $ sort -n test.csv 100,00869184 200,04071 1234,31 6492,8361 
2
  • Ah see, I would have suggested using a German locale or similar. Can't think of anything else right now without being able to test it or what tools you have available, since this is a pretty rare Unix version. Commented Aug 27, 2012 at 16:33
  • @slhck That seems to be the crux of most of the problems I get stuck on in UNIX :) Thanks for your help in getting me to a solution, regardless. Commented Aug 27, 2012 at 16:38
6

GNU's sort does this by default:

$ cat test 100,00869184 6492,8361 1234,31 200,04071 $ gsort -nt',' < test 100,00869184 200,04071 1234,31 6492,8361 

Version:

$ gsort --version sort (GNU coreutils) 8.19 

There's a caveat though: If your sorting does not work as expected, then your locale is probably set to something different than C. Why is this? locale defines sorting and interpretation of letters, numbers, decimal characters et cetera.

To check this, just enter locale in a Terminal. Is LC_NUMERIC set to en_US.UTF-8, maybe? This would explain the wrong sort order. Set it back to C:

export LC_NUMERIC=C 

Then, try your sort command again. If you want to set your global locale to C, do this with:

export LC_ALL=C 
4
  • I don't have access to GNU in my environment. Is it something I could easily get then remove when I am finished? HMU in chat if someone would like to help me do this... I'm quite the UNIX newbie. Commented Aug 27, 2012 at 16:08
  • I'm pretty sure it's just a locale issue. But what's sort --version for you, actually? Commented Aug 27, 2012 at 16:11
  • sort --version gives me an illegal argument. -- commands haven't worked for me in the past either. I checked the man page and there's no version called out explicitly, but it does list "HP-UX 11i Version 2: August 2003" if that helps at all. My LC_NUMERIC is set to "C". Commented Aug 27, 2012 at 16:14
  • German locale for example would use , as a decimal separator. I've never used HP-UX though. Commented Aug 27, 2012 at 16:32
1

Try adding the -g option which is suppose to perform numeric sorting.

Try:

sort -t',' -g <whatever> 
3
  • Isn't -n numeric sorting? -g gives me an illegal option. Commented Aug 27, 2012 at 15:49
  • -g is the general-numeric-sort option and should actually be available in any recent version of sort. @dpatchery Commented Aug 27, 2012 at 16:00
  • This is at my place of work so I almost definitely do not have a recent version :) Commented Aug 27, 2012 at 16:04
0

Replace the delimeter:

cat commafile | tr , " " | sort -n 

- should help you.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.