Chapter 7. The Evolution of Automation at Google
Besides black art, there is only automation and mechanization.
The Value of Automationβ
- Consistency (ΠΏΠΎΡΡΠΎΡΠ½ΡΡΠ²ΠΎ) - Π΅Π΄ΠΈΠ½ΠΎΠΎΠ±ΡΠ°Π·Π½ΠΎΠ΅ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅ ΠΎΠ΄Π½ΠΈΡ ΠΈ ΡΠ΅Ρ ΠΆΠ΅ Π΄Π΅ΠΉΡΡΠ²ΠΈΠΉ.
- A platform - ΠΊΠ°ΠΊ Π΅Π΄ΠΈΠ½Π°Ρ ΡΠΎΡΠΊΠ° Π΄Π»Ρ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·Π°ΡΠΈΠΈ ΠΈ ΡΡΡΡΠ°Π½Π΅Π½ΠΈΡ ΠΎΡΠΈΠ±ΠΎΠΊ
- Faster Repairs - ΡΠ½ΠΈΠΆΠ΅Π½ΠΈΠ΅ MTTR
- Faster Action - e.g. ΡΠΊΡΠΈΠΏΡ ΠΊΠΎΡΠΎΡΡΠΉ ΡΠ΅ΡΠ°Π΅Ρ ΠΏΡΠΎΠ±Π»Π΅ΠΌΡ
- Time Saving - ΠΏΡΠΎΠ±Π»Π΅ΠΌΠ° Π±ΡΡΡΡΠΎ ΡΠ΅ΡΠ°Π΅ΡΡΡ
A Hierarchy of Automation Classesβ
1) No automation Database master is failed over manually between locations. 2) Externally maintained system-specific automation An SRE has a failover script in his or her home directory. 3) Externally maintained generic automation The SRE adds database support to a βgeneric failoverβ script that everyone uses. 4) Internally maintained system-specific automation The database ships with its own failover script. 5) Systems that donβt need any automation The database notices problems, and automatically fails over without human intervention.
- ΡΡΡΠ½ΡΠ΅ ΠΎΠΏΠ΅ΡΠ°ΡΠΈΠΈ
- Π°Π²ΡΠΎΠΌΠ°ΡΠΈΠ·Π°ΡΠΈΡ (ΡΠΊΡΠΈΠΏΡΡ / ansible)
- Π°Π²ΡΠΎΠ½ΠΎΠΌΠ½ΠΎΡΡΡ (borg)
Detecting Inconsistencies with Prodtestβ
Prodtest - ΡΠ½ΠΈΡ-ΡΠ΅ΡΡΡ Π½Π° python ΠΊΠΎΡΠΎΡΡΠ΅ ΠΏΡΠΎΠ²Π΅ΡΡΡΡ ΠΊΠΎΠ½ΡΠΈΠ³ΡΡΠ°ΡΠΈΡ ΠΊΠ»Π°ΡΡΠ΅ΡΠ° Π½Π° ΡΠΎΠΎΡΠ²Π΅ΡΡΠ²ΠΈΠ΅ Ρ ΠΆΠ΅Π»Π°Π΅ΠΌΠΎΠΉ
Π‘ΠΎ Π²ΡΠ΅ΠΌΠ΅Π½Π΅ΠΌ ΠΊ ΡΠ΅ΡΡΠΎΠΌ Π΄ΠΎΠ±Π°Π²ΠΈΠ»ΠΈ ΡΡΠ½ΠΊΡΠΈΡ ΠΈΡΠΏΡΠ°Π²Π»Π΅Π½ΠΈΡ Π½Π°ΠΉΠ΄Π΅Π½Π½ΡΡ ΠΏΡΠΎΠ±Π»Π΅ΠΌ, Π·Π°ΠΏΡΡΠΊΠΈ ΡΠ΄Π΅Π»Π°Π»ΠΈ ΠΈΠ΄Π΅ΠΌΠΏΠΎΡΠ΅Π½ΡΠ½ΡΠΌΠΈ, ΡΡΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»ΠΎ Π·Π°ΠΏΡΡΠΊΠ°ΡΡ ΡΡΠ΅Π½Π°ΡΠΈΠΉ ΠΏΠ΅ΡΠΈΠΎΠ΄ΠΈΡΠ΅ΡΠΊΠΈ.
Key Insightsβ
Symlinks
Accelerating SREs to On-Call and Beyond - (28)
note
Empty