Sorting the characters in a utf-16 string in java Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?
Why these surprising proportionalities of integrals involving odd zeta values?
Knights and Knaves question
What helicopter has the most rotor blades?
Is my guitar’s action too high?
Providing direct feedback to a product salesperson
Why does my GNOME settings mention "Moto C Plus"?
Proving inequality for positive definite matrix
Married in secret, can marital status in passport be changed at a later date?
Does using the Inspiration rules for character defects encourage My Guy Syndrome?
A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?
Meaning of this sentence, confused by まで
Compiling and throwing simple dynamic exceptions at runtime for JVM
Will I be more secure with my own router behind my ISP's router?
Converting a text document with special format to Pandas DataFrame
Why are two-digit numbers in Jonathan Swift's "Gulliver's Travels" (1726) written in "German style"?
Kepler's 3rd law: ratios don't fit data
Protagonist's race is hidden - should I reveal it?
Short story about an alien named Ushtu(?) coming from a future Earth, when ours was destroyed by a nuclear explosion
How to create a command for the "strange m" symbol in latex?
Putting Ant-Man on house arrest
Why isn't everyone flabbergasted about Bran's "gift"?
“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?
Is there a verb for listening stealthily?
Who can become a wight?
Sorting the characters in a utf-16 string in java
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!What is the difference between String and string in C#?Is Java “pass-by-reference” or “pass-by-value”?How do I read / convert an InputStream into a String in Java?How do I sort a dictionary by value?Sort array of objects by string property valueHow to replace all occurrences of a string in JavaScriptHow to check whether a string contains a substring in JavaScript?How do I convert a String to an int in Java?Why is char[] preferred over String for passwords?Why is it faster to process a sorted array than an unsorted array?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
tl;dr
Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?
Details
Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).
Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)
To be specific, do you convert char[] to int[] or is there a better way to sort?
import java.util.Arrays;
public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
Output:
Initial String: 😁😓😭
Sorted String: ??😁??
java string sorting utf-16
New contributor
add a comment |
tl;dr
Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?
Details
Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).
Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)
To be specific, do you convert char[] to int[] or is there a better way to sort?
import java.util.Arrays;
public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
Output:
Initial String: 😁😓😭
Sorted String: ??😁??
java string sorting utf-16
New contributor
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago
add a comment |
tl;dr
Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?
Details
Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).
Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)
To be specific, do you convert char[] to int[] or is there a better way to sort?
import java.util.Arrays;
public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
Output:
Initial String: 😁😓😭
Sorted String: ??😁??
java string sorting utf-16
New contributor
tl;dr
Java uses 2 chars to represent UTF-16. Using Arrays.sort (unstable sort), messes with char sequencing. Should I convert char[] to int[] or is there a better way?
Details
Java represents Character as UTF-16. But Character class itself wraps char(16 bit). For UTF-16, it will be array of 2 char(32 bit).
Sorting String of UTF-16 chars using inbuilt sort messes with data.
(Arrays.sort uses Dual Pivot Quick sort and Collections.sort uses Arrays.sort to do heavy lifting.)
To be specific, do you convert char[] to int[] or is there a better way to sort?
import java.util.Arrays;
public class Main
public static void main(String[] args)
int[] utfCodes = 128513, 128531, 128557;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
char[] chars = emojis.toCharArray();
Arrays.sort(chars);
System.out.println("Sorted String: " + new String(chars));
Output:
Initial String: 😁😓😭
Sorted String: ??😁??
java string sorting utf-16
java string sorting utf-16
New contributor
New contributor
edited 2 hours ago
jtahlborn
47.6k56198
47.6k56198
New contributor
asked 2 hours ago
dingydingy
413
413
New contributor
New contributor
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago
add a comment |
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago
add a comment |
3 Answers
3
active
oldest
votes
I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.
Luckily, the codePoints
of the String
are what you used to create the String
itself in this example, so you can simply sort those and create a new String
with the result.
public static void main(String[] args)
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
Initial String: 😓😭😁
Sorted String: 😁😓😭
I switched the order of the characters in your example because they were already sorted.
add a comment |
We can't use char for Unicode, because Java's Unicode char handling is broken.
In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.
So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index)
or the String.codePoints()
stream on JDK 1.8 and above.
New contributor
add a comment |
If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:
int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);
Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.
Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.
(When was the last time you tested for anagrams of emojis?)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
dingy is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.
Luckily, the codePoints
of the String
are what you used to create the String
itself in this example, so you can simply sort those and create a new String
with the result.
public static void main(String[] args)
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
Initial String: 😓😭😁
Sorted String: 😁😓😭
I switched the order of the characters in your example because they were already sorted.
add a comment |
I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.
Luckily, the codePoints
of the String
are what you used to create the String
itself in this example, so you can simply sort those and create a new String
with the result.
public static void main(String[] args)
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
Initial String: 😓😭😁
Sorted String: 😁😓😭
I switched the order of the characters in your example because they were already sorted.
add a comment |
I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.
Luckily, the codePoints
of the String
are what you used to create the String
itself in this example, so you can simply sort those and create a new String
with the result.
public static void main(String[] args)
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
Initial String: 😓😭😁
Sorted String: 😁😓😭
I switched the order of the characters in your example because they were already sorted.
I looked around for a bit and couldn't find any clean ways to sort an array by groupings of two elements without the use of a library.
Luckily, the codePoints
of the String
are what you used to create the String
itself in this example, so you can simply sort those and create a new String
with the result.
public static void main(String[] args)
int[] utfCodes = 128531, 128557, 128513;
String emojis = new String(utfCodes, 0, 3);
System.out.println("Initial String: " + emojis);
int[] codePoints = emojis.codePoints().sorted().toArray();
System.out.println("Sorted String: " + new String(codePoints, 0, 3));
Initial String: 😓😭😁
Sorted String: 😁😓😭
I switched the order of the characters in your example because they were already sorted.
edited 1 hour ago
answered 2 hours ago
Jacob G.Jacob G.
16.9k52466
16.9k52466
add a comment |
add a comment |
We can't use char for Unicode, because Java's Unicode char handling is broken.
In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.
So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index)
or the String.codePoints()
stream on JDK 1.8 and above.
New contributor
add a comment |
We can't use char for Unicode, because Java's Unicode char handling is broken.
In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.
So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index)
or the String.codePoints()
stream on JDK 1.8 and above.
New contributor
add a comment |
We can't use char for Unicode, because Java's Unicode char handling is broken.
In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.
So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index)
or the String.codePoints()
stream on JDK 1.8 and above.
New contributor
We can't use char for Unicode, because Java's Unicode char handling is broken.
In the early days of Java, Unicode code points were always 16-bits (fixed size at exactly one char). However, the Unicode specification changed to allow supplemental characters. That meant Unicode characters are now variable widths, and can be longer than one char. Unfortunately, it was too late to change Java's char implementation without breaking a ton of production code.
So the best way to manipulate Unicode characters is by using code points directly, e.g., using String.codePointAt(index)
or the String.codePoints()
stream on JDK 1.8 and above.
New contributor
New contributor
answered 1 hour ago
peekaypeekay
2063
2063
New contributor
New contributor
add a comment |
add a comment |
If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:
int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);
Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.
Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.
(When was the last time you tested for anagrams of emojis?)
add a comment |
If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:
int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);
Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.
Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.
(When was the last time you tested for anagrams of emojis?)
add a comment |
If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:
int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);
Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.
Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.
(When was the last time you tested for anagrams of emojis?)
If you are using Java 8 or later, then this is a simple way to sort the characters in a string while respecting (not breaking) multi-char codepoints:
int[] codepoints = someString.codePoints().sort().toArray();
String sorted = new String(codepoints, 0, codepoints.length);
Prior to Java 8, I think you either need to use a loop to iterate the code points in the original string, or use a 3rd-party library method.
Fortunately, sorting the codepoints in a String is uncommon enough that the clunkyness and inefficiency of the solutions above are rarely a concern.
(When was the last time you tested for anagrams of emojis?)
edited 20 mins ago
answered 1 hour ago
Stephen CStephen C
528k72590946
528k72590946
add a comment |
add a comment |
dingy is a new contributor. Be nice, and check out our Code of Conduct.
dingy is a new contributor. Be nice, and check out our Code of Conduct.
dingy is a new contributor. Be nice, and check out our Code of Conduct.
dingy is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55803293%2fsorting-the-characters-in-a-utf-16-string-in-java%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This is what we call a "Collation". You should use a library for this because there are many collations to choose from.
– Guillaume F.
2 hours ago