Idris-dev/rts/idris_utf8.h
Edwin Brady c4132807f0 String in C is now UTF8 encoded
Primitives for head/tail/index/cons/reverse/length now all assume the
char* is UTF8 encoded.  Also updated generation of literals to encode as
UTF8.  Primitives are probably not as efficient as they could be (though
some of the will be used rarely)

ASCII strings will work exactly as before.

Everything I know about UTF8 encoding has been learned in the past few
hours. Therefore, this is unlikely to be the best way to do this. Please
educate me, ideally in the form of annotated Pull Requests :).
2015-03-28 17:13:59 +00:00

23 lines
846 B
C

#ifndef _IDRIS_UTF8
#define _IDRIS_UTF8
/* Various functions for dealing with UTF8 encoding. These are probably
not very efficient (and I'm (EB) making no guarantees about their
correctness.) Nevertheless, they mean that we can treat Strings as
UFT8. Patches welcome :). */
// Get length of a UTF8 encoded string in characters
int idris_utf8_strlen(char *s);
// Get number of bytes the first character takes in a string
int idris_utf8_charlen(char* s);
// Return int representation of string at an index.
// Assumes in bounds.
unsigned idris_utf8_index(char* s, int j);
// Convert a char as an integer to a char* as a byte sequence
// Null terminated; caller responsible for freeing
char* idris_utf8_fromChar(int x);
// Reverse a UTF8 encoded string, putting the result in 'result'
char* idris_utf8_rev(char* s, char* result);
#endif