4 tips for logging on GCP using golang and logrus

My experiences on logging to GCP Kubernetes that hopefully to be helpful for you as well
Huy Ngo
Feb 25, 2020

Logging is important part and has long history developed for different types of applications. Software development changes rapidly from all of its aspects. Logging is not an exception. In the old days, enterprise application developers used to focus mostly on output location, rotation, and rollover. Nowadays, as we are moving to cloud deployment, logging also come with its new standards.

According to The Twelve-Factor App, cloud native apps should log as event streams and never concern itself with routing or storage. The log will still be handled somewhere else by cloud provider or another process. Therefore, logging on cloud native apps focus more on distributing events follow industrial standards.

Below are my tips for logging on GCP Kubernetes using Golang and Logrus library.

Add context to your messages

It’ll be very frustrating if we are troubleshooting an issue but find no log in the target section. Do you know what is even more frustrating? That is when we do have log, but the log actually does not give you any valuable info. No thing is worse than this kind of message:

Transaction failed

or

Data validation error

Without proper context, these messages add not much value. Below messages make much more senses in term of troubleshooting:

Transaction failed: insufficient funds

or

Data validation error: invalid email address

or

Transaction 123456 was successfully, amount transferred: $42,394

We can improve these messages a little. Logs are not only for human, but also for machines. In the world of data, the log can be input source for automated processing (e.g. monitoring and alerting). Your log should not only be human-readable, but also be easy to parse. If you’re implementing a parsing service, below log format will save you much time:

2020-01-31 13:42:12,471 INFO Processing payment user_id=37162 transaction_id=38271 amount=32371 currency=USD

or, you can even log in JSON format:

{"time": "2020-01-31 13:42:12,471", "level": info, "msg": "Processing payment", "user_id": 37416, "transaction_id": 957126, "amount":3000, "currency": "USD"}

To archive this, instead of formatting the log yourself, you should attach the context data separately from your log message. Log formatter will handle the rest. Below are samples about how to do this in with logrus:

package main

import (
	log "github.com/sirupsen/logrus"
)

func main() {
	... 
    
	log.WithError(err).Error("Transaction failed")
	log.WithField("transaction", transaction).Debug("Transaction processed successfully")
	log.WithFields(log.Fields{
		"user_id": 2384172,
		"transaction_id": 9823183,
		"amount": 5000,
		"currency": "USD",
	}).Info("Transaction processed successfully")
	...
}

The code above seem terribly long at first. But once you got acquainted, it’s very helpful and much more flexible to attach context data. You can also create a logger object with initial context data. All logs coming from this logger will contain these info:

logger := log.WithFields(log.Fields{
	"user_id": 2384172,
	"transaction_id": 9823183,
	"amount": 5000,
	"currency": "USD",
})
    
...

logger.Info("Start transaction")
logger.WithError(err).Error("Transaction failed")

Log to stdout and stderr

Many cloud providers has their container engines capture logs output pipelines (stdout, stderr) of the contained application. The container wraps your message to a new log entry. Level of the new log entry depends on the pipeline where the original log came from. If it was received from stdout, the new log will be INFO. Otherwise, the new log will be ERROR. Container log is very helpful for troubleshooting, especially when you want to see combine logs, including external logs from cloud provider or other services.

logrus, by default, writes all logs to stderr. It means, all of your logs will appear in container logs with ERROR level. On GCP Kubernetes, all the logs will be highlighted in red. It’s not only confusing but also disable color filter, which help us to scan for real errors quickly. Sooner or later, we will need to stream logs to correct pipeline. Let’s say, all logs equals or higher WARN level will be streamed to stderr. Other logs will go to stdout. One solution to do this is setting a custom output for logrus:

import (
	"os"
	"regexp"

	log "github.com/sirupsen/logrus"
)

type LogWriter struct{}

var levelRegex *regexp.Regexp

const (
	LevelError   = "error"
	LevelWarning = "warning"
	LevelFatal   = "fatal"
	LevelPanic   = "panic"
)

func init() {
	var err error
	levelRegex, err = regexp.Compile("level=([a-z]+)")
	if err != nil {
		log.WithError(err).Fatal("Cannot setup log level")
	}
}

func (w *LogWriter) detectLogLevel(p []byte) (level string) {
	matches := levelRegex.FindStringSubmatch(string(p))
	if len(matches) > 1 {
		level = matches[1]
	}
	return
}

func (w *LogWriter) Write(p []byte) (n int, err error) {
	level := w.detectLogLevel(p)

	if level == LevelError || level == LevelWarning || level == LevelFatal || level == LevelPanic {
		return os.Stderr.Write(p)
	}
	return os.Stdout.Write(p)
}

...

func main() {
	log.SetOutput(&LogWriter{})
}

Log global default fields

If you are streaming logs from several services or instances into one destination (e.g. stack driver), you will soon need to put some extra fields for filtering purpose. For my project, I need to add service name, process ID, and environment (staging, production) to all entries. To archive this in logrus, I added a custom hook. Hook is a feature of logrus to trigger some actions when user write a log. This feature is very powerful, especially in cloud world, where we have many possible use cases for logs (add data, stream to cloud, report error, alert, etc.).

import (
	"os"

	"github.com/sirupsen/logrus"
)

type ExtraFieldHook struct {
	service string
	env     string
	pid     int
}

func NewExtraFieldHook(service string, env string) *ExtraFieldHook {
	return &ExtraFieldHook{
		service: service,
		env:     env,
		pid:     os.Getpid(),
	}
}

func (h *ExtraFieldHook) Levels() []logrus.Level {
	return logrus.AllLevels
}

func (h *ExtraFieldHook) Fire(entry *logrus.Entry) error {
	entry.Data["service"] = h.service
	entry.Data["env"] = h.env
	entry.Data["pid"] = h.pid
	return nil
}

...

func main() {
	log.AddHook(NewExtraFieldHook(service, config.Env))
	...
}

Log to stack driver and report error

Streaming all logs from all services to one place will help you centralize data for easier monitor and audit, especially when your services are deployed on different places.

logrus has several hooks for different third parties. Unfortunately, the one for stack driver does not seem to work. By diving deeper into its code, I found the hook is not using stack driver library from official doc. It looks like a legacy version of stack driver API. Therefore, I come up with my own version:

  • All logs will be streamed to stack driver
  • All logs with error or higher level will be reported to stack driver’s error reporting
  • Remove logrus’s stack from error’s stack trace. This is to prevent error reporting from grouping your errors into one
import (
	"context"
	"encoding/json"
	"fmt"
	"log"
	"regexp"
	"runtime"

	"cloud.google.com/go/errorreporting"
	"cloud.google.com/go/logging"
	"github.com/sirupsen/logrus"
)

type StackDriverHook struct {
	client      *logging.Client
	errorClient *errorreporting.Client
	logger      *logging.Logger
}

var logLevelMappings = map[logrus.Level]logging.Severity{
	logrus.TraceLevel: logging.Default,
	logrus.DebugLevel: logging.Debug,
	logrus.InfoLevel:  logging.Info,
	logrus.WarnLevel:  logging.Warning,
	logrus.ErrorLevel: logging.Error,
	logrus.FatalLevel: logging.Critical,
	logrus.PanicLevel: logging.Critical,
}

func NewStackDriverHook(logName string) (*StackDriverHook, error) {
	ctx := context.Background()

	client, err := logging.NewClient(ctx, config.GCPProjectID)
	if err != nil {
		return nil, err
	}

	errorClient, err := errorreporting.NewClient(ctx, config.GCPProjectID, errorreporting.Config{
		ServiceName: config.GCPProjectID,
		OnError: func(err error) {
			log.Printf("Could not log error: %v", err)
		},
	})
	if err != nil {
		return nil, err
	}

	return &StackDriverHook{
		client:      client,
		errorClient: errorClient,
		logger:      client.Logger(logName),
	}, nil
}

func (sh *StackDriverHook) Close() {
	sh.client.Close()
	sh.errorClient.Close()
}

func (sh *StackDriverHook) Levels() []logrus.Level {
	return logrus.AllLevels
}

func (sh *StackDriverHook) Fire(entry *logrus.Entry) error {
	payload := map[string]interface{}{
		"Message": entry.Message,
		"Data":    entry.Data,
	}
	level := logLevelMappings[entry.Level]
	sh.logger.Log(logging.Entry{Payload: payload, Severity: level})
	if int(level) >= int(logging.Error) {
		err := getError(entry)
		if err == nil {
			errData, e := json.Marshal(payload)
			if e != nil {
				fmt.Printf("Error %v", e)
			}
			fmt.Print(string(errData))
			err = fmt.Errorf(string(errData))
		}
		fmt.Println(err.Error())

		sh.errorClient.Report(errorreporting.Entry{
			Error: err,
			Stack: sh.getStackTrace(),
		})
	}
	return nil
}

func (sh *StackDriverHook) getStackTrace() []byte {
	stackSlice := make([]byte, 2048)
	length := runtime.Stack(stackSlice, false)
	stack := string(stackSlice[0:length])
	re := regexp.MustCompile("[\r\n].*logrus.*")
	res := re.ReplaceAllString(stack, "")
	return []byte(res)
}

type stackDriverError struct {
	Err         interface{}
	Code        interface{}
	Description interface{}
	Message     interface{}
	Env         interface{}
}

func (e stackDriverError) Error() string {
	return fmt.Sprintf("%v - %v - %v - %v - %v", e.Code, e.Description, e.Message, e.Err, e.Env)
}

func getError(entry *logrus.Entry) error {
	errData := entry.Data["error"]
	env := entry.Data["env"]
	code := entry.Data["ErrCode"]
	desc := entry.Data["ErrDescription"]
	msg := entry.Message

	err := stackDriverError{
		Err:         errData,
		Code:        code,
		Message:     msg,
		Description: desc,
		Env:         env,
	}

	return err
}

func (sh *StackDriverHook) Wait() {}

Conclusion

Above are some logging tips that I found during working with golang, logrus and GCP. The detailed implementation for other libraries or programming language may be different, but I hope that the main ideas are still valuable for you, especially when you’re deploying your app to GCP:

  • Log messages should contains context data, and should be friendly to both human and machine
  • Pay attention to your log output stream, it may affect to your container logs
  • Using hooks to add extra steps per log entry. Its use cases are a lot