Keeping instances in sync

Sometimes individual test plan instances need to coordinate what they are doing. For this, we use the sync service and the primitives it offers, such as states and barriers.

The general concept is this -- as a plan reaches a phase of execution for which synchronization is required, it will signal the other instances that it has reached that phase. Then, wait until a certain number of other instances reach the same state before continuing.

?> Internally, state signalling is implemented with a simple atomic counter backed by a Redis database. To wait for a signal barrier means to wait until the counter reaches the specified number, i.e. instances signalling on that key.

Let's have a look at a plan that waits for others! The plan displayed here is taken from the example plans.

Notice the use of:

  • client.MustSignalEntry(ctx, readyState) to signal the entry to the synchronized portion of the plan.

    • When we signal entry, the sync service returns a sequence number, i.e. the amount of instances that have signaled. We can use this piece of data to pick a role to assume in the test plan (e.g. leader or follower in this case).

  • <-client.MustBarrier(ctx, readyState, numFollowers).C to wait for others to reach the same state.

In this test case, the instance with seq==1 will act as the coordinator and all the others will follow.

plans/example/sync.go

func ExampleSync(runenv *runtime.RunEnv) error {
	var (
		enrolledState = sync.State("enrolled")
		readyState    = sync.State("ready")
		releasedState = sync.State("released")

		ctx = context.Background()
	)

	// instantiate a sync service client, binding it to the RunEnv.
	client := sync.MustBoundClient(ctx, runenv)
	defer client.Close()

	// instantiate a network client; see 'Traffic shaping' in the docs.
	netclient := network.NewClient(client, runenv)
	runenv.RecordMessage("waiting for network initialization")

	// wait for the network to initialize; this should be pretty fast.
	netclient.MustWaitNetworkInitialized(ctx)
	runenv.RecordMessage("network initilization complete")

	// signal entry in the 'enrolled' state, and obtain a sequence number.
	seq := client.MustSignalEntry(ctx, enrolledState)

	runenv.RecordMessage("my sequence ID: %d", seq)

	// if we're the first instance to signal, we'll become the LEADER.
	if seq == 1 {
		runenv.RecordMessage("i'm the leader.")
		numFollowers := runenv.TestInstanceCount - 1

		// let's wait for the followers to signal.
		runenv.RecordMessage("waiting for %d instances to become ready", numFollowers)
		err := <-client.MustBarrier(ctx, readyState, numFollowers).C
		if err != nil {
			return err
		}

		runenv.RecordMessage("the followers are all ready")
		runenv.RecordMessage("ready...")
		time.Sleep(1 * time.Second)
		runenv.RecordMessage("set...")
		time.Sleep(5 * time.Second)
		runenv.RecordMessage("go, release followers!")

		// signal on the 'released' state.
		client.MustSignalEntry(ctx, releasedState)
		return nil
	}

	rand.Seed(time.Now().UnixNano())
	sleep := rand.Intn(5)
	runenv.RecordMessage("i'm a follower; signalling ready after %d seconds", sleep)
	time.Sleep(time.Duration(sleep) * time.Second)
	runenv.RecordMessage("follower signalling now")

	// signal entry in the 'ready' state.
	client.MustSignalEntry(ctx, readyState)

	// wait until the leader releases us.
	err := <-client.MustBarrier(ctx, releasedState, 1).C
	if err != nil {
		return err
	}

	runenv.RecordMessage("i have been released")
	return nil
}

When we run this, we need to add more than one instance. For testing, let's run it with 10 instances and have a look at the output.

Using --collect we can get an archive of the output from all instances into an .tgz after the run completes. Then we can poke around, or process it with scripts. You will notice that most of the output generated by Testground is in easy-to-parse JSON format.

?> You can also collect the output after the fact using the testground collect command.

# run the test plan
$ testground run single --plan example \
                        --testcase sync \
                        --builder exec:go \
                        --runner local:exec \
                        --instances 10 \
                        --wait \
                        --collect

[...]

>>> Result:

May  7 16:59:19.197351	INFO	finished run with ID: 551fd2421ca1

>>> Result:

May  7 16:59:19.209681	INFO	created file: 551fd2421ca1.tgz

# extract the outputs archive
$ tar -xzvf 551fd2421ca1.tgz

$ cd 551fd2421ca1

# print out the contents of all `run.out` from all instances
$ find . | grep run.out | xargs awk '{print FILENAME, " >>> ", $0 }' | sort

./single/0/run.out  >>>  {"ts":1588769459341224000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"registering default http handler at: http://[::]:51677/ (pprof: http://[::]:51677/debug/pprof/)"}}
./single/0/run.out  >>>  {"ts":1588769459341303000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"start","runenv":{"plan":"example","case":"sync","run":"e217c3a8a089","params":{},"instances":4,"outputs_path":"/Users/nonsense/tghome/data/outputs/local_exec/example/e217c3a8a089/single/0","network":"127.1.0.0/16","group":"single","group_instances":4}}}
./single/0/run.out  >>>  {"ts":1588769459424266000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Waiting for network initialization"}}
./single/0/run.out  >>>  {"ts":1588769459424312000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"network initialisation successful"}}
./single/0/run.out  >>>  {"ts":1588769459424330000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Network initilization complete"}}
./single/0/run.out  >>>  {"ts":1588769459425997000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"My sequence ID: 4"}}
./single/0/run.out  >>>  {"ts":1588769459426060000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"I'm a follower. Signaling ready after 1 seconds"}}
./single/0/run.out  >>>  {"ts":1588769460354489000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/0/run.out  >>>  {"ts":1588769465354614000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/0/run.out  >>>  {"ts":1588769470353134000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/0/run.out  >>>  {"ts":1588769470434227000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Received Start"}}
./single/0/run.out  >>>  {"ts":1588769470434337000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"finish","outcome":"ok"}}
./single/0/run.out  >>>  {"ts":1588769470443381000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"io closed"}}
./single/0/run.out  >>>  {"ts":1588769470458073000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/1/run.out  >>>  {"ts":1588769459341256000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"registering default http handler at: http://[::]:51679/ (pprof: http://[::]:51679/debug/pprof/)"}}
./single/1/run.out  >>>  {"ts":1588769459341329000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"start","runenv":{"plan":"example","case":"sync","run":"e217c3a8a089","params":{},"instances":4,"outputs_path":"/Users/nonsense/tghome/data/outputs/local_exec/example/e217c3a8a089/single/1","network":"127.1.0.0/16","group":"single","group_instances":4}}}
./single/1/run.out  >>>  {"ts":1588769459424352000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Waiting for network initialization"}}
./single/1/run.out  >>>  {"ts":1588769459424387000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"network initialisation successful"}}
./single/1/run.out  >>>  {"ts":1588769459424404000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Network initilization complete"}}
./single/1/run.out  >>>  {"ts":1588769459425996000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"My sequence ID: 3"}}
./single/1/run.out  >>>  {"ts":1588769459426058000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"I'm a follower. Signaling ready after 3 seconds"}}
./single/1/run.out  >>>  {"ts":1588769460354489000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/1/run.out  >>>  {"ts":1588769465354611000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/1/run.out  >>>  {"ts":1588769470351967000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/1/run.out  >>>  {"ts":1588769470431852000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Received Start"}}
./single/1/run.out  >>>  {"ts":1588769470432029000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"finish","outcome":"ok"}}
./single/1/run.out  >>>  {"ts":1588769470438221000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"io closed"}}
./single/1/run.out  >>>  {"ts":1588769470444752000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/2/run.out  >>>  {"ts":1588769459341164000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"registering default http handler at: http://[::]:6060/ (pprof: http://[::]:6060/debug/pprof/)"}}
./single/2/run.out  >>>  {"ts":1588769459341225000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"start","runenv":{"plan":"example","case":"sync","run":"e217c3a8a089","params":{},"instances":4,"outputs_path":"/Users/nonsense/tghome/data/outputs/local_exec/example/e217c3a8a089/single/2","network":"127.1.0.0/16","group":"single","group_instances":4}}}
./single/2/run.out  >>>  {"ts":1588769459423576000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Waiting for network initialization"}}
./single/2/run.out  >>>  {"ts":1588769459423623000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"network initialisation successful"}}
./single/2/run.out  >>>  {"ts":1588769459423634000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Network initilization complete"}}
./single/2/run.out  >>>  {"ts":1588769459425071000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"My sequence ID: 1"}}
./single/2/run.out  >>>  {"ts":1588769459425111000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"I'm the boss."}}
./single/2/run.out  >>>  {"ts":1588769459425128000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Waiting for 3 instances to become ready"}}
./single/2/run.out  >>>  {"ts":1588769460354489000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/2/run.out  >>>  {"ts":1588769463433338000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"The followers are all ready"}}
./single/2/run.out  >>>  {"ts":1588769463433419000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Ready..."}}
./single/2/run.out  >>>  {"ts":1588769464434839000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Set..."}}
./single/2/run.out  >>>  {"ts":1588769465354620000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/2/run.out  >>>  {"ts":1588769469438375000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Go!"}}
./single/2/run.out  >>>  {"ts":1588769469439899000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"finish","outcome":"ok"}}
./single/2/run.out  >>>  {"ts":1588769469445744000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"io closed"}}
./single/2/run.out  >>>  {"ts":1588769469448944000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 32 points"}}
./single/3/run.out  >>>  {"ts":1588769459341236000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"registering default http handler at: http://[::]:51678/ (pprof: http://[::]:51678/debug/pprof/)"}}
./single/3/run.out  >>>  {"ts":1588769459341314000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"start","runenv":{"plan":"example","case":"sync","run":"e217c3a8a089","params":{},"instances":4,"outputs_path":"/Users/nonsense/tghome/data/outputs/local_exec/example/e217c3a8a089/single/3","network":"127.1.0.0/16","group":"single","group_instances":4}}}
./single/3/run.out  >>>  {"ts":1588769459424359000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Waiting for network initialization"}}
./single/3/run.out  >>>  {"ts":1588769459424408000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"network initialisation successful"}}
./single/3/run.out  >>>  {"ts":1588769459424492000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Network initilization complete"}}
./single/3/run.out  >>>  {"ts":1588769459425979000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"My sequence ID: 2"}}
./single/3/run.out  >>>  {"ts":1588769459426057000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"I'm a follower. Signaling ready after 1 seconds"}}
./single/3/run.out  >>>  {"ts":1588769460354631000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}
./single/3/run.out  >>>  {"ts":1588769465352913000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/3/run.out  >>>  {"ts":1588769470351959000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 31 points"}}
./single/3/run.out  >>>  {"ts":1588769470434215000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"Received Start"}}
./single/3/run.out  >>>  {"ts":1588769470434336000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"finish","outcome":"ok"}}
./single/3/run.out  >>>  {"ts":1588769470443382000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"io closed"}}
./single/3/run.out  >>>  {"ts":1588769470446654000,"msg":"","group_id":"single","run_id":"e217c3a8a089","event":{"type":"message","message":"influxdb: uploaded 1 points"}}

Last updated