What is the optimal winning strategy for this modified blackjack game?

Questions

Is there a best value to stay on so that I win the greatest percentage of games possible? If so, what is it?

Edit: Is there an exact probability of winning that can be calculated for a given limit, independent of whatever the opponent does? (I haven't done probability & statistics since college). I'd be interested in seeing that as an answer to contrast it with my simulated results.

Edit: Fixed bugs in my algorithm, updated result table.

Background

I've been playing a modified blackjack game with some rather annoying rule tweaks from the standard rules. I've italicized the rules that are different from the standard blackjack rules, as well as included the rules of blackjack for those not familiar.

Modified Blackjack Rules

  • Exactly two human players (dealer is irrelevant)
  • Each player is dealt two cards face down
  • Neither player _ever_ knows the value of _any_ of the opponent's cards
  • Neither player knows the value of the opponent's hand until _both_ have finished the hand
  • Goal is to come as close to score of 21 as possible. Outcomes:
  • If player's A & B have identical score, game is a draw
  • If player's A & B both have a score over 21 (a bust), game is a draw
  • If player A's score is <= 21 and player B has busted, player A wins
  • If player A's score is greater than player B's, and neither have busted, player A wins
  • Otherwise, player A has lost (B has won).
  • Cards are worth:
  • Cards 2 through 10 are worth the corresponding amount of points
  • Cards J, Q, K are worth 10 points
  • Card Ace is worth 1 or 11 points
  • Each player may request additional cards one at a time until:
  • The player doesn't want any more (stay)
  • The player's score, with any Aces counted as 1, exceeds 21 (bust)
  • Neither player knows how many cards the other has used at any time
  • Once both players have either stayed or busted the winner is determined per rule 3 above.
  • After each hand the entire deck is reshuffled and all 52 cards are in play again
  • What is a deck of cards?

    A deck of cards consists of 52 cards, four each of the following 13 values:

    2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A

    No other property of the cards are relevant.

    A Ruby representation of this is:

    CARDS = ((2..11).to_a+[10]*3)*4
    

    Algorithm

    I've been approaching this as follows:

  • I will always want to hit if my score is 2 through 11, as it is impossible to bust
  • For each of the scores 12 through 21 I will simulate N hands against an opponent
  • For these N hands, the score will be my "limit". Once I reach the limit or greater, I will stay.
  • My opponent will follow the exact same strategy
  • I will simulate N hands for every permutation of the sets (12..21), (12..21)
  • Print the difference in wins and losses for each permutation as well as the net win loss difference
  • Here is the algorithm implemented in Ruby:

    #!/usr/bin/env ruby
    class Array
      def shuffle
        sort_by { rand }
      end
    
      def shuffle!
        self.replace shuffle
      end
    
      def score
        sort.each_with_index.inject(0){|s,(c,i)|
          s+c > 21 - (size - (i + 1)) && c==11 ? s+1 : s+c
        }
      end
    end
    
    N=(ARGV[0]||100_000).to_i
    NDECKS = (ARGV[1]||1).to_i
    
    CARDS = ((2..11).to_a+[10]*3)*4*NDECKS
    CARDS.shuffle
    
    my_limits = (12..21).to_a
    opp_limits = my_limits.dup
    
    puts " " * 55 + "opponent_limit"
    printf "my_limit |"
    opp_limits.each do |result|
      printf "%10s", result.to_s
    end
    printf "%10s", "net"
    puts
    
    printf "-" * 8 + " |"
    print "  " + "-" * 8
    opp_limits.each do |result|
      print "  " + "-" * 8
    end
    puts
    
    win_totals = Array.new(10)
    win_totals.map! { Array.new(10) }
    
    my_limits.each do |my_limit|
      printf "%8s |", my_limit
      $stdout.flush
      opp_limits.each do |opp_limit|
    
        if my_limit == opp_limit # will be a tie, skip
          win_totals[my_limit-12][opp_limit-12] = 0
          print "        --"
          $stdout.flush
          next
        elsif win_totals[my_limit-12][opp_limit-12] # if previously calculated, print
          printf "%10d", win_totals[my_limit-12][opp_limit-12]
          $stdout.flush
          next
        end
    
        win = 0
        lose = 0
        draw = 0
    
        N.times {
          cards = CARDS.dup.shuffle
          my_hand = [cards.pop, cards.pop]
          opp_hand = [cards.pop, cards.pop]
    
          # hit until I hit limit
          while my_hand.score < my_limit
            my_hand << cards.pop
          end
    
          # hit until opponent hits limit
          while opp_hand.score < opp_limit
            opp_hand << cards.pop
          end
    
          my_score = my_hand.score
          opp_score = opp_hand.score
          my_score = 0 if my_score > 21 
          opp_score = 0 if opp_score > 21
    
          if my_hand.score == opp_hand.score
            draw += 1
          elsif my_score > opp_score
            win += 1
          else
            lose += 1
          end
        }
    
        win_totals[my_limit-12][opp_limit-12] = win-lose
        win_totals[opp_limit-12][my_limit-12] = lose-win # shortcut for the inverse
    
        printf "%10d", win-lose
        $stdout.flush
      end
      printf "%10d", win_totals[my_limit-12].inject(:+)
      puts
    end
    

    Usage

    ruby blackjack.rb [num_iterations] [num_decks]
    

    The script defaults to 100,000 iterations and 4 decks. 100,000 takes about 5 minutes on a fast macbook pro.

    Output (N = 100 000)

                                                           opponent_limit
    my_limit |        12        13        14        15        16        17        18        19        20        21       net
    -------- |  --------  --------  --------  --------  --------  --------  --------  --------  --------  --------  --------
          12 |        --     -7666    -13315    -15799    -15586    -10445     -2299     12176     30365     65631     43062
          13 |      7666        --     -6962    -11015    -11350     -8925      -975     10111     27924     60037     66511
          14 |     13315      6962        --     -6505     -9210     -7364     -2541      8862     23909     54596     82024
          15 |     15799     11015      6505        --     -5666     -6849     -4281      4899     17798     45773     84993
          16 |     15586     11350      9210      5666        --     -6149     -5207       546     11294     35196     77492
          17 |     10445      8925      7364      6849      6149        --     -7790     -5317      2576     23443     52644
          18 |      2299       975      2541      4281      5207      7790        --    -11848     -7123      8238     12360
          19 |    -12176    -10111     -8862     -4899      -546      5317     11848        --    -18848     -8413    -46690
          20 |    -30365    -27924    -23909    -17798    -11294     -2576      7123     18848        --    -28631   -116526
          21 |    -65631    -60037    -54596    -45773    -35196    -23443     -8238      8413     28631        --   -255870
    

    Interpretation

    This is where I struggle. I'm not quite sure how to interpret this data. At first glance it seems like always staying at 16 or 17 is the way to go, but I'm not sure if it's that easy. I think it's unlikely that an actual human opponent would stay on 12, 13, and possibly 14, so should I throw out those opponent_limit values? Also, how can I modify this to take into account the variability of a real human opponent? eg a real human is likely to stay on 15 just based on a "feeling" and may also hit on 18 based on a "feeling"


    I'm suspicious of your results. For example, if the opponent aims for 19, your data says that the best way to beat him is to hit until you reach 20. This does not pass a basic smell test. Are you sure you don't have a bug? If my opponent is striving for 19 or better, my strategy would be to avoid busting at all costs: stay on anything 13 or higher (maybe even 12?). Going for 20 has to wrong -- and not just by a small margin, but by a lot.

    How do I know that your data is bad? Because the blackjack game you are playing isn't unusual . It's the way a dealer plays in most casinos: the dealer hits up to a target and then stops, regardless of what the other players hold in their hands. What is that target? Stand on hard 17 and hit soft 17. When you get rid of the bugs in your script, it should confirm that the casinos know their business.

    When I make the following replacements to your code:

    # Replace scoring method.
    def score
      s = inject(0) { |sum, c| sum + c }
      return s if s < 21
      n_aces = find_all { |c| c == 11 }.size
      while s > 21 and n_aces > 0
          s -= 10
          n_aces -= 1
      end
      return s
    end
    
    # Replace section of code determining hand outcome.
    my_score  = my_hand.score
    opp_score = opp_hand.score
    my_score  = 0 if my_score  > 21
    opp_score = 0 if opp_score > 21
    if my_score == opp_score
      draw += 1
    elsif my_score > opp_score
      win += 1
    else
      lose += 1
    end
    

    The results agree with the behavior of casino dealers: 17 is the optimal target .

    n=10000
                                                           opponent_limit
    my_limit |        12        13        14        15        16        17        18        19        20        21       net
    -------- |  --------  --------  --------  --------  --------  --------  --------  --------  --------  --------  --------
          12 |        --      -843     -1271     -1380     -1503     -1148      -137      1234      3113      6572
          13 |       843        --      -642     -1041     -1141      -770       -93      1137      2933      6324
          14 |      1271       642        --      -498      -784      -662        93      1097      2977      5945
          15 |      1380      1041       498        --      -454      -242      -100       898      2573      5424
          16 |      1503      1141       784       454        --      -174        69       928      2146      4895
          17 |      1148       770       662       242       174        --        38       631      1920      4404
          18 |       137        93       -93       100       -69       -38        --       489      1344      3650
          19 |     -1234     -1137     -1097      -898      -928      -631      -489        --       735      2560
          20 |     -3113     -2933     -2977     -2573     -2146     -1920     -1344      -735        --      1443
          21 |     -6572     -6324     -5945     -5424     -4895     -4404     -3650     -2560     -1443        --
    

    Some miscellaneous comments :

    The current design is inflexible. With a just little refactoring, you could achieve a clean separation between the operation of the game (dealing, shuffling, keeping running stats) and player decision making. This would allow you to test various strategies against each other. Currently, your strategies are embedded in loops that are all tangled up in the game operation code. Your experimentation would be better served by a design that allowed you to create new players and set their strategy at will.


    Two comments:

  • It looks like there isn't a single dominating strategy based on a "hit limit":

  • If you choose 16 your opponent can choose 17
  • if you choose 17 your opponent can choose 18
  • if you choose 18 your opponent can choose 19
  • if you choose 19 your opponent can choose 20
  • if you choose 20 your opponent can choose 12
  • if you choose 12 your opponent can choose 16.
  • 2. You do not mention whether players can see how many cards their opponent has drawn (I would guess so). I would expect this information to be incorporated into the "best" strategy. (answered)


    With no information about the other players decisions, the game gets simpler. But since there is clearly no dominant "pure" strategy, the optimal strategy will be a "mixed" strategy. That is: a set of probabilities for each score from 12 to 21 for whether you should stop or draw another card (EDIT: you will need different probabilities for a given score with no aces vs the score with aces.) Executing the strategy then requires you to randomly choose (according to the probabilities) whether to stop or continue after each new draw. You can then find the Nash equilibrium for the game.

    Of course, if you are only asking the simpler question: what is the optimal winning strategy against suboptimal players (for example ones that always stop on 16, 17, 18 or 19) you are asking an entirely diiferent question, and you will have to specify exactly in which way the other player is limited compared to you.


    Here are some thoughts about the data you've collected:

  • It's mildly useful for telling you what your "hit limit" should be, but only if you know that your opponent is following a similar "hit limit" strategy.
  • Even then, It's only really useful if you know what your opponent's "hit limit" is or is likely to be. You can just choose a limit that gives you more wins than them.
  • You can more or less ignore the actual values in the table. It's whether they're positive or negative that matters.
  • To show your data another way, the first number is your opponent's limit, and the second group of numbers are the limits you can choose and win with. The one with an asterisk is the "winningest" choice:

    12:   13, 14, 15, 16*, 17, 18
    13:   14, 15, 16*, 17, 18, 19
    14:   15, 16, 17*, 18, 19
    15:   16, 17*, 18, 19
    16:   17, 18*, 19
    17:   18*, 19
    18:   19*, 20
    19:   12, 20*
    20:   12*, 13, 14, 15, 16, 17
    21:   12*, 13, 14, 15, 16, 17, 18, 19, 20
    

    From that, you can see that a hit limit of 17 or 18 is the safest option if the opponent is following a random "hit limit" selection strategy because 17 and 18 will beat 7/10 opponent "hit limits".

    Of course if your opponent is human, you can't reply on them self-imposing a "hit limit" of under 18 or over 19, so that completely negates the previous calculations. I still think these numbers are useful however:


    I agree that for any individual hand, you can be reasonably confident that your opponent will have a limit after which they will stop hitting, and they'll stay. If you can guess at that limit, you can choose your own limit based on that estimate.

    If you think they're being optimistic or they're happy to risk it, choose a limit of 20 - you'll beat them in the long run provided their limit is above 17. If you're really confident, choose a limit of 12 - that will win if their limit is above 18 and there are much more frequent winnings to be had here.

    If you think they're being conservative or risk averse, choose a limit of 18. That will win if they're staying anywhere below 18 themselves.

    For neutral ground, maybe think about what your limit would be without any outside influence. Would you normally hit on a 16? A 17?

    In short, you can only guess at what your opponent's limit is, but if you guess well, you can beat them over the long term with those statistics.

    链接地址: http://www.djcxy.com/p/30696.html

    上一篇: 如何将玩家分为多个部门?

    下一篇: 这个修改过的二十一点游戏的最佳获胜策略是什么?