Trying to understand the Ruby .chr and .ord methods

I've been working with the Ruby chr and ord methods recently and there are a few things I don't understand.

My current project involves converting individual characters to and from ordinal values. As I understand it, if I have a string with an individual character like "A" and I call ord on it I get its position on the ASCII table which is 65. Calling the inverse, 65.chr gives me the character value "A", so this tells me that Ruby has a collection somewhere of ordered character values, and it can use this collection to give me the position of a specific character, or the character at a specific position. I may be wrong on this, please correct me if I am.

Now I also understand that Ruby's default character encoding uses UTF-8 so it can work with thousands of possible characters. Thus if I ask it for something like this:

'好'.ord

I get the position of that character which is 22909. However, if I call chr on that value:

22909.chr

I get "RangeError: 22909 out of char range." I'm only able to get char to work on values up to 255 which is extended ASCII. So my questions are:

  • Why does Ruby seem to be getting values for chr from the extended ASCII character set but ord from UTF-8?
  • Is there any way to tell Ruby to use different encodings when it uses these methods? For instance, tell it to use ASCII-8BIT encoding instead of whatever it's defaulting to?
  • If it is possible to change the default encoding, is there any way of getting the total number of characters available in the set being used?

  • According to Integer#chr you can use the following to force the encoding to be UTF_8.

    22909.chr(Encoding::UTF_8)
    #=> "好"
    

    To list all available encoding names

    Encoding.name_list
    #=> ["ASCII-8BIT", "UTF-8", "US-ASCII", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "UTF-16", "UTF-32", ...]
    

    A hacky way to get the maximum number of characters

    2000000.times.reduce(0) do |x, i|
      begin
        i.chr(Encoding::UTF_8)
        x += 1
      rescue
      end
    
      x
    end
    #=> 1112064
    

    After tooling around with this for a while, I realized that I could get the max number of characters for each encoding by running a binary search to find the highest value that doesn't throw a RangeError.

    def get_highest_value(set)
      max = 10000000000
      min = 0
      guess = 5000000000
    
      while true
        begin guess.chr(set)
          if (min > max)
            return max
          else
            min = guess + 1
            guess = (max + min) / 2
          end
        rescue
          if min > max
            return max
          else
            max = guess - 1
            guess = (max + min) / 2
          end
        end
      end
    end
    

    The value fed to the method is the name of the encoding being checked.

    链接地址: http://www.djcxy.com/p/34770.html

    上一篇: AWS Cognito用户池标识REST示例

    下一篇: 试图了解Ruby .chr和.ord方法