Skip to main content

Chapter 7. The Evolution of Automation at Google

Besides black art, there is only automation and mechanization.

The Value of Automation​

  • Consistency (постоянство) - Π΅Π΄ΠΈΠ½ΠΎΠΎΠ±Ρ€Π°Π·Π½ΠΎΠ΅ Π²Ρ‹ΠΏΠΎΠ»Π½Π΅Π½ΠΈΠ΅ ΠΎΠ΄Π½ΠΈΡ… ΠΈ Ρ‚Π΅Ρ… ΠΆΠ΅ дСйствий.
  • A platform - ΠΊΠ°ΠΊ Сдиная Ρ‚ΠΎΡ‡ΠΊΠ° для Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΈ устранСния ошибок
  • Faster Repairs - сниТСниС MTTR
  • Faster Action - e.g. скрипт ΠΊΠΎΡ‚ΠΎΡ€Ρ‹ΠΉ Ρ€Π΅ΡˆΠ°Π΅Ρ‚ ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΡƒ
  • Time Saving - ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌΠ° быстро Ρ€Π΅ΡˆΠ°Π΅Ρ‚ΡΡ

A Hierarchy of Automation Classes​

1) No automation Database master is failed over manually between locations. 2) Externally maintained system-specific automation An SRE has a failover script in his or her home directory. 3) Externally maintained generic automation The SRE adds database support to a β€œgeneric failover” script that everyone uses. 4) Internally maintained system-specific automation The database ships with its own failover script. 5) Systems that don’t need any automation The database notices problems, and automatically fails over without human intervention.

  • Ρ€ΡƒΡ‡Π½Ρ‹Π΅ ΠΎΠΏΠ΅Ρ€Π°Ρ†ΠΈΠΈ
  • автоматизация (скрипты / ansible)
  • Π°Π²Ρ‚ΠΎΠ½ΠΎΠΌΠ½ΠΎΡΡ‚ΡŒ (borg)

Detecting Inconsistencies with Prodtest​

Prodtest - ΡŽΠ½ΠΈΡ‚-тСсты Π½Π° python ΠΊΠΎΡ‚ΠΎΡ€Ρ‹Π΅ ΠΏΡ€ΠΎΠ²Π΅Ρ€ΡΡŽΡ‚ ΠΊΠΎΠ½Ρ„ΠΈΠ³ΡƒΡ€Π°Ρ†ΠΈΡŽ кластСра Π½Π° соотвСствиС с ΠΆΠ΅Π»Π°Π΅ΠΌΠΎΠΉ

prod-test

Π‘ΠΎ Π²Ρ€Π΅ΠΌΠ΅Π½Π΅ΠΌ ΠΊ тСстом Π΄ΠΎΠ±Π°Π²ΠΈΠ»ΠΈ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΡŽ исправлСния Π½Π°ΠΉΠ΄Π΅Π½Π½Ρ‹Ρ… ΠΏΡ€ΠΎΠ±Π»Π΅ΠΌ, запуски сдСлали ΠΈΠ΄Π΅ΠΌΠΏΠΎΡ‚Π΅Π½Ρ‚Π½Ρ‹ΠΌΠΈ, Ρ‡Ρ‚ΠΎ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»ΠΎ Π·Π°ΠΏΡƒΡΠΊΠ°Ρ‚ΡŒ сцСнарий пСриодичСски.

prod-test-fix

Key Insights​

Symlinks
note

Empty