This document explains a few options available for finding bugs in test plans and troubleshooting failures
While writing a test plan, there are a few ways to troubleshoot. On this page I will introduce bugs intentionally so we can see how the system behaves and troubleshoot it.
Build errors
$ testground plan create --plan planbuggy
The command above will create a default planbuggy test plan. Unfortunately, for our purposes, the plan has no bugs. Edit main.go so it contains the following buggier code:
main.go
packagemainimport ("github.com/testground/sdk-go/run""github.com/testground/sdk-go/runtime")funcmain() { run.Invoke(test)}functest(runenv *runtime.RunEnv) error {// No closing quote, will not build. runenv.RecordMessage("Hello Bugs")returnnil}
How it looks in terminal output
When this plan runs, the code is sent to the daemon to be built. Of course, this will fail. Notice that the output comes in several sections. The section labeled Server output shows us the error encountered by our builder.
$testgroundrunsingle--planplanbuggy--testcasequickstart--runnerlocal:exec--builderexec:go--instances1May500:31:15.650020INFOusinghomedirectory:/home/cory/testgroundMay500:31:15.650143INFOno.env.tomlfoundat/home/cory/testground/.env.toml; runningwithdefaultsMay500:31:15.650182INFOtestgroundclientinitialized{"addr":"localhost:8042"}May500:31:15.651180INFOusinghomedirectory:/home/cory/testgroundMay500:31:15.651300INFOno.env.tomlfoundat/home/cory/testground/.env.toml; runningwithdefaultsMay500:31:15.651339INFOtestgroundclientinitialized{"addr":"localhost:8042"}May500:31:15.651772INFOtestplansourceat:/home/cory/testground/plans/planbuggy>>> Serveroutput:May500:31:15.662268INFOperformingbuildforgroups{"req_id":"be8cc8ee","plan":"planbuggy","groups": ["single"], "builder": "exec:go"}May500:31:15.906353ERRORgobuildfailed:# github.com/coryschwartz/planbuggy./main.go:10:35:newlineinstring./main.go:10:35:syntaxerror:unexpectednewline,expectingcommaor )./main.go:11:2:syntaxerror:unexpectedreturnatendofstatement {"req_id":"be8cc8ee"}May500:31:15.906505INFObuildfailed{"req_id":"be8cc8ee","plan":"planbuggy","groups": ["single"], "builder": "exec:go", "error": "failed to run the build; exit status 2"}May500:31:15.906563WARNenginebuilderror:failedtorunthebuild; exitstatus2{"req_id":"be8cc8ee"}>>> Error:enginebuilderror:failedtorunthebuild; exitstatus2
In this case, the error is pretty straightforward, but in a more complex plan, this output can be difficult to parse. So what can you do?
Using standard debugging tools
Test plans are regular executables which accept configuration through environment variables. Because of this, you can test by compiling, testing, and running code. Except for the sync service, the code can be tested outside of Testground. Let's test the code this time without sending it to the Testground daemon. Let's see what the same code looks like testing locally.
?> If your plan relies on knowledge of the test plan or test case, this can be passed as an environment variable.
Now that output is much more readable!
I can't claim that build errors will always be as easy to diagnose as this one, but this feature enables plan writers to employ traditional debugging techniques or other debugging tools which they are already familiar.
Debugging with message output
The next technique is useful for plans which build correctly and you want to observe the behaviour for debugging. If you have ever debugged a program by adding logging or printing to the screen, you know exactly what I'm talking about. On Testground plans can emit events and messages.
runenv.RecordEvent("this is a message")
Another thing which might be useful for debugging is events. Just like messages, events can be used as a point-in-time caputre of the current state. Events are included in the outputs collection. They are recorded in the order they occur for each plan instance. We created R() and D() metrics collectors (results and debugging). The difference between these two is that debugging is sent to the metrics pipeline fairly quickly whereas results are collected at the end of a test run.
var things intfor {// work work work... things++ runenv.D().RecordPoint("how_many_things", things)}runenv.R().RecordPoint("total_things", things)
In the following plan, five philosophers Ron Swansons sit at a table with five forks between them. Unfortunately, there is an implementation bug and these Ron Swansons will be be here forever. Add some debugging messages using runenv.RecordMessage to see if you can straighten this whole thing out (hint: answer is in the second tab)
excersise
packagemainimport ("sync""github.com/testground/sdk-go/run""github.com/testground/sdk-go/runtime")typeForkstruct { id int m *sync.Mutex}typeSwansonstruct { id int meals int leftFork, rightFork *Fork wg *sync.WaitGroup}func (ron *Swanson) Feast(runenv *runtime.RunEnv) { ron.leftFork.m.Lock() ron.rightFork.m.Lock() runenv.D().RecordPoint("eats", 1) ron.leftFork.m.Unlock() ron.meals = ron.meals -1if ron.meals >0 { runenv.RecordMessage("Still hungry. %d more meals to fill me up.", ron.meals) ron.Feast(runenv) } else { runenv.RecordMessage("All done. For now...") ron.wg.Done() }}funcmain() { run.Invoke(test)}functest(runenv *runtime.RunEnv) error {// Five hungry rons eat 10 plates of food each. countRons :=5 countMeals :=10 wg :=sync.WaitGroup{}// Create forks forks :=make([]*Fork, countRons)for i :=0; i < countRons; i++ { forks[i] =&Fork{ id: i, m: new(sync.Mutex), } }// Each ron swanson has has a fork to his left and right rons :=make([]*Swanson, countRons) wg.Add(countRons)for i :=0; i <= countRons; i++ { rons[i] =&Swanson{ id: i, leftFork: forks[i%countRons], rightFork: forks[(i+1)%countRons], meals: countMeals, wg: &wg, }go rons[i].Feast(runenv) } wg.Wait() runenv.RecordMessage("all rons have eaten")returnnil}
solution
Line 24.
Ron puts down his left fork, but forgets to put down his right fork!
Add another line to unlick the rigtFork mutex to fix this problem.
Line 58.
We think there are an equal number of Rons and forks, but we are off
by one. Two Rons are sharing the same seat! this will not do.
If you can successfully debug this code, you will see each ron finish his meals and finally the end message "all rons have eaten"
Collecting outputs vs viewing messages in the terminal
When using the local runners, with a relatively small number of plan instances it is fairly easy to view outputs in the terminal runner. I recommend troubleshooting the plan with a small number of instances. The same messages you can see in your terminal are also available in outputs collections.
Help! When starting a testplan (e.g. using testground run single) I get an error such a the following:
Pool overlaps with other one on this address space
Solution: Most likely you have a conflicting (docker) network from a previous run of one of your testground runs:
You can find any related testground plan network on your (docker) system using the command docker network ls;
In that output you should see some networks whom have your testplan or testcase referenced in its name;
Copy the linked network ids so you can remove them using the docker network rm <id> cmd.
Once you completed these steps you should be able to run your testground plan without seeing the error above.
Accessing profile data
All Go test plans have profiling enabled by default.
For information about using Go's pprof and generating graphs and reports, I recommend you start here.
On Testground gaining access to the pprof port can sometimes be non-obvious. Allow me to explain how to get access to port 6060 on each of our runners:
local:exec
# Plans in the local:exec runner operate in the default namespace.# the first plan will grab port 6060 and the any additional will listen# on a random port.## look for the following messages in the test run output to figure out# the URL where to access the pprof endpoint of each instance:## May 6 14:32:10.146239 INFO 0.0174s MESSAGE << instance 1 >># registering default http handler at: http://[::]:6060/# (pprof: http://[::]:6060/debug/pprof/)## May 6 14:32:10.146535 INFO 0.0179s MESSAGE << instance 2 >># registering default http handler at: http://[::]:64912/# (pprof: http://[::]:64912/debug/pprof/)$echo"top"|gotoolpprofhttp://localhost:6060/debug/pprof/heapFetchingprofileoverHTTPfromhttp://localhost:6060/debug/pprof/heaptop:opentop:nosuchfileordirectoryFetched1sourceprofilesoutof2Savedprofilein/home/cory/pprof/pprof.exec-go--planbuggy-ba70706f7cd1.alloc_objects.alloc_space.inuse_objects.inuse_space.003.pb.gzFile:exec-go--planbuggy-ba70706f7cd1Type:inuse_spaceTime:May5,2020at1:05am (PDT)Enteringinteractivemode (type "help"forcommands,"o"foroptions)(pprof) Showingnodesaccountingfor544.67kB,100%of544.67kBtotalflatflat%sum%cumcum%544.67kB100%100%544.67kB100%net.open00%100%544.67kB100%internal/singleflight.(*Group).doCall00%100%544.67kB100%net.(*Resolver).goLookupIPCNAMEOrder00%100%544.67kB100%net.(*Resolver).lookupIP00%100%544.67kB100%net.(*Resolver).lookupIPAddr.func100%100%544.67kB100%net.glob..func100%100%544.67kB100%net.goLookupIPFiles00%100%544.67kB100%net.lookupStaticHost00%100%544.67kB100%net.readHosts
local:docker
# local:docker runner has isolated networking for each container.# An ephemeral port is assigned in the host network namespace which maps# to port 6060 in the containers namespace.# Find out the port assignment using `docker ps` and then profile the# appropriate port. For example, in this case I have three plans running# in different containers. To profile one of these, I can point pprof tool# to a port in the host namespace.$dockerps-f'name=tg-'--format"{{.Ports}}"0.0.0.0:33279->6060/tcp0.0.0.0:33278->6060/tcp0.0.0.0:33280->6060/tcp$echotop|gotoolpprofhttp://localhost:33279/debug/pprof/heapFetchingprofileoverHTTPfromhttp://localhost:33279/debug/pprof/heapSavedprofilein/home/cory/pprof/pprof.testplan.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gzFile:testplanType:inuse_spaceTime:May5,2020at1:15am (PDT)Enteringinteractivemode (type "help"forcommands,"o"foroptions)(pprof) Showingnodesaccountingfor1026.44kB,100%of1026.44kBtotalShowingtop10nodesoutof11flatflat%sum%cumcum%514kB50.08%50.08%514kB50.08%bufio.NewWriterSize (inline)512.44kB49.92%100%512.44kB49.92%runtime.allocm00%100%514kB50.08%net/http.(*conn).serve00%100%514kB50.08%net/http.newBufioWriterSize00%100%512.44kB49.92%runtime.handoffp00%100%512.44kB49.92%runtime.mcall00%100%512.44kB49.92%runtime.newm00%100%512.44kB49.92%runtime.park_m00%100%512.44kB49.92%runtime.schedule00%100%512.44kB49.92%runtime.startm
cluster:k8s
# When using the Kubernetes cluster:k8s runner, each container runs in a# separate pod. Access the profiling port throught the Kubernetes API using# a port forward or kubectl proxy. The following example sets up a port# forward to a sidecar pod and then runs a profiler against it.$kubectlport-forwardtestground-sidecar6060:6060$echotop|gotoolpprofhttp://localhost:6060/debug/pprof/heapFetchingprofileoverHTTPfromhttp://localhost:6060/debug/pprof/heapSavedprofilein/home/cory/pprof/pprof.testground.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gzFile:testgroundType:inuse_spaceTime:May5,2020at1:41am (PDT)Enteringinteractivemode (type "help"forcommands,"o"foroptions)(pprof) Showingnodesaccountingfor3112.91kB,100%of3112.91kBtotalShowingtop10nodesoutof21flatflat%sum%cumcum%1025.94kB32.96%32.96%1025.94kB32.96%regexp/syntax.(*compiler).inst (inline)548.84kB17.63%50.59%548.84kB17.63%github.com/markbates/pkger/internal/takeon/github.com/markbates/hepa/filters.glob..func3514kB16.51%67.10%514kB16.51%regexp.mergeRuneSets.func2 (inline)512.08kB16.45%83.55%2052.02kB65.92%regexp.compile512.05kB16.45%100%512.05kB16.45%runtime.systemstack00%100%512.62kB16.47%github.com/dustin/go-humanize.init.000%100%1026.08kB32.96%github.com/go-playground/validator/v10.init00%100%548.84kB17.63%github.com/markbates/pkger/internal/takeon/github.com/markbates/hepa/filters.init00%100%513.31kB16.49%k8s.io/apimachinery/pkg/util/naming.init00%100%2052.02kB65.92%regexp.Compile (inline)