Commit Graph

66 Commits

Author SHA1 Message Date
Matthew Sevey c45b9c41bd
format 2021-12-01 14:16:46 -05:00
Matthew Sevey 3e17c1a9ee
remove ping instead of disable 2021-12-01 12:07:15 -05:00
Matthew Sevey 3d11ca503f
Merge branch 'master' into sevey/disabl-load-check 2021-12-01 12:06:08 -05:00
Filip Rysavy 5ae447f9b2
Dump disk usage on health-checker critical disk space 2021-12-01 13:27:33 +01:00
Filip Rysavy f5b81d1287
Fix disabling portal in health checker 2021-11-26 16:38:53 +01:00
Matthew Sevey 31cf9fb59e
Disable load check until we have a process of actively addressing it 2021-11-24 14:13:26 -05:00
Karol Wypchlo 9026d56777
fix health check script invalid syntax
python lint job
2021-10-06 14:21:58 +02:00
Matthew Sevey b75c0c2c3b
Merge pull request #1178 from SkynetLabs/round-full-hours
round reporting datetime to full hours
2021-09-10 14:25:30 -04:00
Karol Wypchlo bcea4d5b90
do not send message on server down 2021-09-10 17:00:37 +02:00
Karol Wypchlo 4fddbb2fc8
do not notify on portal disabled 2021-09-10 16:23:05 +02:00
Karol Wypchlo 5ba4f54302
round reporting datetime to full hours 2021-09-10 13:24:19 +02:00
Karol Wypchlo 72f2b56f17
include skapps checks 2021-08-23 16:21:09 +02:00
Karol Wypchło 1c8816530c
cleanup unused discord import (#1073) 2021-08-17 09:58:00 +02:00
Karol Wypchło 379f87ea27
increase disk space size that warrants a warning (#1063) 2021-08-12 13:14:00 +02:00
Karol Wypchło 71f9d5280e
use webhook instead of discrod bot to send messages (#979)
* initial refactor

* do not use before define

* forgot to remove client

* test notification

* add /cc

* fix /cc

* fix /cc role

* fix /cc

* test file upload

* test file upload

* test file upload

* default to no mentions

* unformat

* replace discord with DiscordWebhook

* add readme

* don't fail on failures in message send
2021-07-16 13:12:58 +02:00
Karol Wypchlo 36aa7c8311 improve health check reliability 2021-07-12 14:53:12 +02:00
Karol Wypchlo a2aa850632 improve health check reliability 2021-07-12 14:49:53 +02:00
Karol Wypchlo 7fd97b5824 improve health check reliability 2021-07-12 14:48:13 +02:00
Karol Wypchlo 49bb6dd2e2 fix portal size check reporting zero files 2021-06-15 11:41:01 +02:00
Karol Wypchło b8a6816876
fixed health check blowing up on eu-fin-3 (#838)
* request 127.0.0.1 over https - http localhost causes issues

* reformat with black
2021-06-07 15:08:18 +02:00
Karol Wypchlo cd7dac5b7e verbose => extended 2021-04-29 13:43:40 +02:00
Karol Wypchlo 1c99da3af8 fix repair string 2021-04-14 12:17:01 +02:00
Karol Wypchlo f48a8d9302 fix health-check 2021-04-13 16:19:42 +02:00
Matthew Sevey c752a17058
Update setup-scripts/health-checker.py 2021-02-03 10:30:27 -07:00
Matthew Sevey 50dff35da8
Update setup-scripts/health-checker.py
Co-authored-by: Marcin S. <scatman@bu.edu>
2021-02-03 10:22:24 -07:00
Matthew Sevey ff183beb66 Add repair size information to health checker 2021-02-03 09:42:55 -07:00
Karol Wypchło c0673b3f76
do not ping when server is in maintenance mode (#552) 2020-12-01 13:31:59 +01:00
Matthew Sevey 5f76d1ca52 remove error alert notification, subtract out siafile alerts 2020-11-24 07:49:51 -07:00
Karol Wypchlo 2dfb6d6a56 restore "or" 2020-11-24 15:26:51 +01:00
Karol Wypchlo 383144b7a6 tweak notifications on number of files in a node 2020-11-24 13:16:25 +01:00
Karol Wypchlo 7946f97d58 tweak notifications on error alerts 2020-11-24 13:08:08 +01:00
Ivaylo Novakov 41460f155f
Moved the container name var to the global space where it belongs. 2020-11-20 22:08:04 +01:00
Ivaylo Novakov 801597ccde
Fixed some typos.
Fixed formatting (force of habit...).
2020-11-20 21:45:19 +01:00
Matthew Sevey 05cd1bfb32 fix weird formatting 2020-11-20 11:46:35 -07:00
Matthew Sevey a337b754a8 run python format 2020-11-20 11:33:07 -07:00
Matthew Sevey efc6060924 scripts: update file health check to check siac output. Add total files check 2020-11-20 11:26:20 -07:00
Matthew Sevey 243d084b5d update message for siafile bad health 2020-11-18 11:04:04 -07:00
Matthew Sevey 09a4b646ec srcipts: add alert check to the python scripts 2020-11-18 10:21:06 -07:00
Karol Wypchlo 1922c4cd98 use os.popopen manually 2020-10-06 12:12:19 +02:00
Karol Wypchlo 9b6d61aa7e remove unnecessary time dependency 2020-10-06 11:27:06 +02:00
Karol Wypchlo 60f8371170 stop sia container on critical disk space threshold 2020-10-06 11:24:18 +02:00
Karol Wypchlo 2328e605b7 parse disk size as int before multiplying 2020-10-05 10:03:10 +02:00
Karol Wypchło e58752571e
add response content to health check failures (#437) 2020-09-30 16:20:55 +02:00
Karol Wypchło 10a251c081
reimplement health checks (#434) 2020-09-29 12:32:45 +02:00
Karol Wypchlo 20362fe7c5 fix health checks 2020-09-10 15:16:31 +02:00
Ivaylo Novakov 8235d75795
Only announce healthy status once a day. 2020-09-08 18:20:56 +02:00
Ivaylo Novakov ddf72ad850
Make the time comparisons in the health checker timezone-aware. 2020-09-08 18:07:33 +02:00
Ivaylo Novakov 2d032dbf17
Docstrings. 2020-09-07 17:59:39 +02:00
Ivaylo Novakov 0838e4f5e5
Add free disk space check to health-checker.py.
Move load-average check to health-checker.py.
2020-09-07 17:56:47 +02:00
Ivaylo Novakov 3f4742a436
Only notify the team if critical checks have failed. 2020-09-04 17:17:26 +02:00