Parsing text by regex, split, tokinize, or hash

I am parsing a CSV file that contains text that represents duration, which might be any combination of hours, minutes, or both. For example:

  • "1 hour 30 minutes"
  • "2 hours"
  • "45 minutes"
  • I want to be able to do this: duration = h.hours + m.minutes and make sure that h is hour (if exists) and the same for m .

    I tried solving this with this regex /(d*)s?hourD*(d*)s?min/) , but this won't detect minutes alone, or hours alone.

    So I changed it to this /(d+)s?D*s?(d*)/ , but it's wrong too because there is no way to tell if the value is an hour or minute, so I can convert it to hour or minutes .

    I am confused on which way could solve this problem in my app. Is it regex, hash, matching, or any other way? Any help or advice is appreciated.


    This is pretty straightforward to match with regex if you know that there is at least one of those present in the string. For example:

    (?:(d+)s*hours?)?s*(?:(d+)s*minutes?)?
    

    Here's one fancy way:

    def string_to_duration(string)
      string.downcase.scan(/(d+)s+(hours?|minutes?)/).map do |number, unit|
        number.to_i.send(unit)
      end.reduce(:+)
    end
    

    Test:

    require "active_support/all"
    
    input = [
      "1 hour 30 minutes",
      "2 hours",
      "45 minutes"
    ]
    
    def string_to_duration(string)
      string.downcase.scan(/(d+)s+(hours?|minutes?)/).map do |number, unit|
        number.to_i.send(unit)
      end.reduce(:+)
    end
    
    input.each do |str|
      puts string_to_duration str
    end
    

    Output:

    5400
    7200
    2700
    

    Note: This would also accept duplicate units like "1 minute 1 minute 1 minute" will print 180 .


    这是我会做的,我相信这是最直接的方式:

    str = "1 hour 30 minutes"
    h = str[/(d+) hour/, 1].to_i rescue 0
    m = str[/(d+) minute/, 1].to_i rescue 0
    
    链接地址: http://www.djcxy.com/p/92830.html

    上一篇: C#从MatchCollection创建数组

    下一篇: 通过正则表达式,拆分,tokinize或散列解析文本