Uses of Checkpoint/Restart
Gang scheduling
●
No queue drain for maintenance, policy change
●
Higher utilization and/or more flexible scheduling
Process migration
●
Save job if node failure imminent
●
Pack jobs for optimal network performance
Periodic backup
●
Not our main focus
●
Application can always do more efficiently
●
But may be useful for systems with long jobs, fast I/O,
and/or high node failure rates
评论0