fix(instance): POST /instance/pair returns empty pairing code instead of working code#39
Conversation
The /manager dashboard previously showed only a static placeholder
("Dashboard content will be implemented here..."). This replaces it
with a standalone HTML page that fetches live data from the API and
displays real metrics:
- Total instances count
- Connected instances count and percentage
- Disconnected instances count
- Server health status (GET /server/ok)
- AlwaysOnline count
- Instance table with name, status badge, phone number, client and
AlwaysOnline indicator
- Auto-refresh every 30 seconds with manual refresh button
Implementation uses a standalone HTML file (Tailwind CDN + vanilla JS
fetch) served at GET /manager, keeping the existing compiled bundle
intact for all other routes (/manager/instances, /manager/login, etc.).
Changes:
- manager/dashboard/index.html: new self-contained dashboard page
- pkg/routes/routes.go: serve dashboard/index.html for GET /manager
(exact), keep dist/index.html for GET /manager/*any (wildcard)
- Dockerfile: copy manager/dashboard/ into the final image
- .gitignore: exclude manager build artifacts from version control
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes the '// TODO: not working' markers from the six chat endpoints (pin, unpin, archive, unarchive, mute, unmute). Investigation confirmed the implementation is correct: the endpoints work on fully-established sessions that have synced WhatsApp app state keys. The markers were likely added after testing on a fresh session where keys had not yet been distributed by the WhatsApp server. Also fixes the hardcoded 1-hour mute duration: the BodyStruct now accepts an optional `duration` field (seconds). Sending 0 or omitting the field mutes the chat indefinitely, matching WhatsApp's own behaviour.
Reject negative duration values with a 400-level validation error. Document that duration=0 maps to 'mute forever' (BuildMute treats 0 as a zero time.Duration, which causes BuildMuteAbs to set the WhatsApp sentinel timestamp of -1). Clamp duration to a maximum of 1 year (31536000 seconds) to avoid unreasonably large timestamps being sent to the WhatsApp API.
Adds GET /metrics serving standard Prometheus text format. No authentication required — follows the Prometheus convention of protecting the endpoint at the network/ingress level. Metrics exposed: evolution_instances_total total registered instances (gauge) evolution_instances_connected connected instances (gauge) evolution_instances_disconnected disconnected instances (gauge) evolution_http_requests_total HTTP requests by method/path/status (counter) evolution_http_request_duration_seconds HTTP latency by method/path (histogram) evolution_build_info always 1, version label carries the value (gauge) evolution_uptime_seconds seconds since server start (gauge) Instance gauges use a custom Collector that queries the database on each scrape, so values are always current without event hooks. HTTP path labels use Gin registered route patterns (e.g. /instance/:instanceId) to keep cardinality bounded regardless of distinct IDs in the path. New dependency: github.com/prometheus/client_golang v1.20.5
…volutionAPI#20 GET /instance/status was calling ensureClientConnected, which returns an error when the WhatsApp client exists but is not connected (e.g. after the user manually removes the device from their phone). This caused the endpoint to return HTTP 400 until the container was restarted, making it impossible for clients to detect the disconnected state without restarting the server. Status is a read-only query: it should report the current state, not require an active connection to do so. The fix reads clientPointer directly and returns Connected=false/LoggedIn=false when the client is nil or disconnected, without attempting reconnection. Fixes EvolutionAPI#20
The Pair function was calling PairPhone directly without checking if the client was initialized, and was silently swallowing errors from PairPhone. This caused two problems: 1. If the client was nil or disconnected, PairPhone returned an error but the function ignored it and returned HTTP 200 with an empty PairingCode field, misleading the caller into thinking it succeeded. 2. The client was never started before PairPhone was called. WhatsApp requires the client to be connected to the WA websocket (waiting for auth) before a pairing code can be generated. Fix: - Start the instance automatically if no active connection exists, mirroring the QR code flow - Wait 3 seconds for the WA websocket connection to establish and the initial QR generation to begin (required by whatsmeow before PairPhone can be called) - Reject early if the instance is already authenticated - Return PairPhone errors to the caller instead of swallowing them, so the handler correctly responds with HTTP 500 and an actionable error message Fixes EvolutionAPI#21
Reviewer's GuideImplements a robust fix for /instance/pair by properly initializing and validating the WhatsApp client before pairing, improves status handling and chat mute API, introduces Prometheus metrics and a new HTML dashboard, and exposes metrics/manager assets via updated routes and Docker config. Updated class diagram for metrics, instance repository and chat muteclassDiagram
class InstanceRepository {
<<interface>>
+Create(instance *Instance) error
+Get(instanceId string) (*Instance, error)
+GetAllConnectedInstances() []*Instance
+GetAllConnectedInstancesByClientName(clientName string) []*Instance
+GetAll(clientName string) []*Instance
+GetAllInstances() ([]*Instance, error)
+Delete(instanceId string) error
+GetAdvancedSettings(instanceId string) (*AdvancedSettings, error)
+UpdateAdvancedSettings(instanceId string, settings *AdvancedSettings) error
}
class instanceRepository {
-db *gorm.DB
+GetAllInstances() ([]*Instance, error)
+Delete(instanceId string) error
+GetAll(clientName string) ([]*Instance, error)
}
InstanceRepository <|.. instanceRepository
class Registry {
-reg *prometheus.Registry
-httpRequests *prometheus.CounterVec
-httpDuration *prometheus.HistogramVec
+New(version string, instanceRepo InstanceRepository) *Registry
+Handler() http.Handler
+GinMiddleware() gin.HandlerFunc
}
class instanceCollector {
-repo InstanceRepository
-descTotal *prometheus.Desc
-descConnected *prometheus.Desc
-descDisconnected *prometheus.Desc
+Describe(ch chan<- *prometheus.Desc)
+Collect(ch chan<- prometheus.Metric)
}
Registry --> instanceCollector : uses
instanceCollector --> InstanceRepository : queries
class chatService {
-loggerWrapper LoggerWrapper
+ChatMute(data *BodyStruct, instance *Instance) (string, error)
+ChatUnmute(data *BodyStruct, instance *Instance) (string, error)
+ensureClientConnected(instanceId string) (*WAClient, error)
}
class BodyStruct {
+Chat string
+Duration int64
}
chatService --> BodyStruct : parameter
class instances {
-clientPointer map[string]*WAClient
-loggerWrapper LoggerWrapper
-whatsmeowService WhatsmeowService
+Status(instance *Instance) (*StatusStruct, error)
+GetQr(instance *Instance) (*QrcodeStruct, error)
+Pair(data *PairStruct, instance *Instance) (*PairReturnStruct, error)
}
class StatusStruct {
+Connected bool
+LoggedIn bool
+myJid string
+Name string
}
class PairStruct {
+Phone string
}
class PairReturnStruct {
+PairingCode string
}
instances --> StatusStruct : returns
instances --> PairStruct : parameter
instances --> PairReturnStruct : returns
instances --> InstanceRepository : may use via services
class WAClient {
+IsConnected() bool
+IsLoggedIn() bool
+PairPhone(ctx context.Context, phone string, force bool, clientType whatsmeow.PairClientType, clientName string) (string, error)
+SendAppState(ctx context.Context, muteState appstate.MuteState) error
}
instances --> WAClient : uses clientPointer
chatService --> WAClient : uses ensureClientConnected
class WhatsmeowService {
+StartInstance(instanceId string) error
}
instances --> WhatsmeowService : uses to start instance
class LoggerWrapper {
+GetLogger(instanceId string) Logger
}
class Logger {
+LogInfo(format string, args interface)
+LogError(format string, args interface)
}
instances --> LoggerWrapper : uses for Pair
chatService --> LoggerWrapper : uses for ChatMute
class appstate {
+BuildMute(recipient JID, mute bool, duration time.Duration) MuteState
}
chatService --> appstate : BuildMute
class Instance {
+Id string
+Connected bool
+Name string
+ClientName string
+AlwaysOnline bool
}
instanceRepository --> Instance : manages
instanceCollector --> Instance : reads fields
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
instances.Pair, the hardcodedtime.Sleep(3 * time.Second)both blocks the request and assumes a fixed websocket/Qr setup time; consider replacing this with an event-/state-based wait (or at least a configurable timeout + retry loop) so pairing is more robust under slow/fast connections. - The new
instanceCollector.Collectsilently returns onGetAllInstanceserror, which can hide DB or repository issues; it would be safer to at least log the error (or expose it via Prometheus’s error reporting patterns) so scrape failures are observable.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `instances.Pair`, the hardcoded `time.Sleep(3 * time.Second)` both blocks the request and assumes a fixed websocket/Qr setup time; consider replacing this with an event-/state-based wait (or at least a configurable timeout + retry loop) so pairing is more robust under slow/fast connections.
- The new `instanceCollector.Collect` silently returns on `GetAllInstances` error, which can hide DB or repository issues; it would be safer to at least log the error (or expose it via Prometheus’s error reporting patterns) so scrape failures are observable.
## Individual Comments
### Comment 1
<location path="pkg/instance/service/instance_service.go" line_range="501" />
<code_context>
+ return nil, fmt.Errorf("instance is already authenticated")
+ }
+
+ code, err := client.PairPhone(context.Background(), data.Phone, true, whatsmeow.PairClientChrome, "Chrome (Linux)")
if err != nil {
- i.loggerWrapper.GetLogger(instance.Id).LogError("[%s] something went wrong calling pair phone", instance.Id)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** PairPhone uses context.Background, which can cause runaway calls under slow or hung connections.
Because this call isn’t tied to a request or timeout, it may keep running after the client disconnects or during shutdown, leaving goroutines stuck. Prefer a bounded or request-scoped context (e.g. `context.WithTimeout` or the incoming request’s context) so the pairing attempt can be cancelled appropriately.
Suggested implementation:
```golang
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
code, err := client.PairPhone(ctx, data.Phone, true, whatsmeow.PairClientChrome, "Chrome (Linux)")
```
1. Ensure `time` is imported in this file, e.g. in the import block add:
- `import "time"` (or add `"time"` to the existing grouped import).
2. If the surrounding function already receives a request-scoped `context.Context` (e.g. `ctx`), prefer:
- `ctx, cancel := context.WithTimeout(ctx, 60*time.Second)` instead of using `context.Background()` directly.
3. You may want to replace the hard-coded `60*time.Second` with a named constant or configuration value if your project has standard timeouts for external calls.
</issue_to_address>
### Comment 2
<location path="pkg/metrics/metrics.go" line_range="24-27" />
<code_context>
+func New(version string, instanceRepo instance_repository.InstanceRepository) *Registry {
</code_context>
<issue_to_address>
**suggestion:** Metrics registry omits default Go/process collectors, which may limit observability.
Using a fresh `prometheus.NewRegistry()` without default collectors means `go_*` and `process_*` metrics (GC, goroutines, memory, FDs, etc.) won’t be exposed. If those are useful for operating this service, consider also registering:
```go
reg.MustRegister(
prometheus.NewGoCollector(),
prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}),
)
```
If you intentionally want a minimal registry, this is acceptable, but it trades off runtime visibility.
```suggestion
func New(version string, instanceRepo instance_repository.InstanceRepository) *Registry {
reg := prometheus.NewRegistry()
reg.MustRegister(
prometheus.NewGoCollector(),
prometheus.NewProcessCollector(prometheus.ProcessCollectorOpts{}),
)
httpRequests := prometheus.NewCounterVec(prometheus.CounterOpts{
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| return nil, fmt.Errorf("instance is already authenticated") | ||
| } | ||
|
|
||
| code, err := client.PairPhone(context.Background(), data.Phone, true, whatsmeow.PairClientChrome, "Chrome (Linux)") |
There was a problem hiding this comment.
suggestion (bug_risk): PairPhone uses context.Background, which can cause runaway calls under slow or hung connections.
Because this call isn’t tied to a request or timeout, it may keep running after the client disconnects or during shutdown, leaving goroutines stuck. Prefer a bounded or request-scoped context (e.g. context.WithTimeout or the incoming request’s context) so the pairing attempt can be cancelled appropriately.
Suggested implementation:
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
code, err := client.PairPhone(ctx, data.Phone, true, whatsmeow.PairClientChrome, "Chrome (Linux)")- Ensure
timeis imported in this file, e.g. in the import block add:import "time"(or add"time"to the existing grouped import).
- If the surrounding function already receives a request-scoped
context.Context(e.g.ctx), prefer:ctx, cancel := context.WithTimeout(ctx, 60*time.Second)instead of usingcontext.Background()directly.
- You may want to replace the hard-coded
60*time.Secondwith a named constant or configuration value if your project has standard timeouts for external calls.
Closes #21
Root cause
Dois bugs na função
Pair:Erro silenciado:
PairPhoneretornava erro (client nil ou não conectado), mas o erro era apenas logado — a função retornavaHTTP 200comPairingCode: "", enganando o caller.Client nunca iniciado: o whatsmeow exige que o client esteja conectado ao websocket do WhatsApp (em estado "aguardando autenticação") antes de
PairPhoneser chamado. A implementação chamavaPairPhonediretamente sem setup.Correção
PairPhone)PairPhoneao caller — handler respondeHTTP 500com mensagem acionável em vez de200com código vazioValidação
Testado localmente com número real:
POST /instance/pairretornou códigoXXXXXXXX, inserido no WhatsApp → instância conectada com sucesso.Summary by Sourcery
Fix phone pairing endpoint to correctly establish WhatsApp client connection and surface errors while adding observability and dashboard improvements.
New Features:
Bug Fixes:
Enhancements:
Build: