Trying to understand the Ruby .chr and .ord methods

2018-06-12 03:32:18

I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord

I get the position of that character which is 22909. However, if I call chr on that value:

22909.chr

I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:

Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?

Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?

If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?

According to Integer#chr you can use the following to force the encoding to be UTF_8.

22909.chr(Encoding::UTF_8)
#=> "好"

To list all available encoding names

Encoding.name_list
#=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]

A hacky way to get the maximum number of characters

2000000.times.reduce(0) do |x, i|
  begin
    i.chr(Encoding::UTF_8)
    x += 1
  rescue
  end

  x
end
#=> 1112064

After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.

def get_highest_value(set)
  max = 10000000000
  min = 0
  guess = 5000000000

  while true
    begin guess.chr(set)
      if (min > max)
        return max
      else
        min = guess + 1
        guess = (max + min) / 2
      end
    rescue
      if min > max
        return max
      else
        max = guess - 1
        guess = (max + min) / 2
      end
    end
  end
end

The value fed to the method is the name of the encoding being checked.

链接地址: http://www.djcxy.com/p/34770.html

上一篇: AWS Cognito用户池标识REST示例

下一篇: 试图了解Ruby .chr和.ord方法