Erlang Supervisor fail to start

the supervisor seems to fail silently starting child...

Here's the supervisor

-behaviour(supervisor).
-export([start_socket/0, init/1, start_link/1]).

-define(SSL_OPTIONS, [{active, once},
                      {backlog, 128},
                      {reuseaddr, true},
                      {packet, 0},
                      {cacertfile, "./ssl_key/server/gd_bundle.crt"},
                      {certfile, "./ssl_key/server/cert.pem"},
                      {keyfile, "./ssl_key/server/cert.key"},
                      {password, "**********"}
                     ]).

start_link(Port) ->
    Role = list_to_atom(atom_to_list(?MODULE) ++ lists:flatten(io_lib:format("~B", [Port]))),
    supervisor:start_link({local, Role}, ?MODULE, [Port]).

init([Port]) ->
    R = ssl:listen(Port, ?SSL_OPTIONS),
    LSocket = case R of
                  {ok, LSock} ->
                      LSock;
                  Res ->
                      io:fwrite("gateway_sup Error: ~p~n", [Res])
              end,
    spawn_link(fun empty_listeners/0),
    ChildSpec = [{socket,
                  {gateway_serv, start_link, [LSocket]},
                  temporary, 1000, worker, [gateway_serv]}
                ],
    {ok, {{simple_one_for_one, 3600, 3600},
          ChildSpec
         }}.

empty_listeners() ->
    io:fwrite("---------------------- empty_listeners~n"),
    [start_socket() || _ <- lists:seq(1,128)],
    ok.

start_socket() ->
    io:fwrite("++++++++++++++++++++++ start_socket~n"),
    supervisor:start_child(?MODULE, []).

And the gen_server

-module(gateway_serv).

-behaviour(gen_server).
-export([start_link/1, init/1, handle_call/3, handle_cast/2, handle_info/2, code_change/3, terminate/2]).

start_link(LSocket) ->
io:fwrite("#################~n"),
    gen_server:start_link(?MODULE, [LSocket], []).

init([LSocket]) ->
io:fwrite("/////////////////~n"),
    gen_server:cast(self(), accept),
    {ok, #client{listenSocket=LSocket, pid=self()}}.

handle_cast(accept, G = #client{listenSocket=LSocket}) ->
    {ok, AcceptSocket} = ssl:transport_accept(LSocket),
    gateway_sup:start_socket(),
    case ssl:ssl_accept(AcceptSocket, 30000) of
    ok ->
        timer:send_after(10000, closingSocket),
        ssl:setopts(AcceptSocket, [{active, once}, {mode, list}, {packet, 0}]),
        {noreply, G#client{listenSocket=none, socket=AcceptSocket}};
    {error, _Reason} ->
        {stop, normal, G}
    end;
handle_cast(_, G) ->
    {noreply, G}.

The gen_server's start_link/1 is apparently never called (checked with a io:fwrite).

Can't seems to find out why...


When you register the supervisor you use:

Role = list_to_atom(atom_to_list(?MODULE) ++ lists:flatten(io_lib:format("~B", [Port]))),

therefore when you call:

start_socket() ->
    io:fwrite("++++++++++++++++++++++ start_socket~n"),
    supervisor:start_child(?MODULE, []).

you are calling a supervisor that does not exist.

You should call it as:

supervisor:start_child(Role, []).

You can pass Role as a parameter to the function.


Something seems strange to me, you launch empty_listener calling start_socket() calling supervisor:start_child within the init function of the supervisor, at this time the supervisor did not finished its initialization phase. So there is a race between the processes which call the supervisor to start children and the supervisor itself.

I think that this code should be outside the init function:

  • First start the supervisor using start_link(Port),
  • and when it returns call the function start_socket().
  • I have done an application which use this pattern and I had 2 level of supervisors:

    main supervisor (one_for_all strategy)
    |                         |
    |                         |
    v                         v
    application   ------->    supervisor (simple_one_for_one strategy)
    server      start_child   worker factory
                              |
                              |
                              v*
                              many children
    

    EDIT: Forget this race condition,

    I made a test introducing a delay before the end of the init function, and I have seen that the start_child function, waiting for the end of the init, nothing is lost. OTP guys have been even more cautious than I imagined...

    链接地址: http://www.djcxy.com/p/6512.html

    上一篇: 服务器由Erlang的主管完成

    下一篇: Erlang Supervisor无法启动