Sensor-Based Reward Learning from Video Labels for Tumble Motion Control in a Household Dryer

To appear at 2026 IEEE 22nd International Conference on Automation Science and Engineering (CASE), August 2026

Jinwoo Lee Chanseok Kang Guntae Bae

LG Electronics AI Lab

To appear

TL;DR

This work transfers human video judgments of laundry motion into a deployable sensor-only reward model, then uses reinforcement learning to improve dryer drum-speed control without requiring video at runtime.

Learning-based reward model pipeline for sensor-only dryer control.

Learning-based reward model pipeline: visual supervision is collected during development, distilled into a sensor-only reward model, and used for reinforcement learning.

Overview

Problem

Efficient drying depends on maintaining cataracting laundry motion, but the internal tumble state is not directly observable from production sensors such as motor current and drum-speed signals.

Approach

The method collects synchronized sensor trajectories and internal video during development, uses human video labels or preferences to train a sensor-only reward model, and freezes that model during SAC-based policy learning.

Outcome

Across five load compositions, the learned controller improved normalized moisture-removal performance by 2.04% on average and 2.86% in the best case compared with an expert-designed baseline.

Motion Labels

Bad Motion: Wall-Following

Laundry mass follows the drum wall instead of tumbling.

The laundry remains attached to the drum wall, reducing effective surface exposure to airflow.

Bad Motion: Rolling

Laundry mass rolls near the bottom of the drum.

The load rolls near the bottom of the drum rather than entering the desired falling motion.

Good Motion: Tumbling

Desirable tumbling motion inside the dryer drum.

The target motion exposes laundry surfaces more evenly to heated airflow and supports drying efficiency.

Method

Stage 1 Collect

Synchronized onboard sensor sequences and internal drum video are gathered on a real dryer.

→

Stage 2 Learn Reward

Video labels or pairwise preferences supervise an LSTM-based reward model that consumes sensor histories only.

→

Stage 3 Train Control

The frozen reward model supplies rewards to a SAC controller that adjusts drum speed from onboard sensors.

Sweep drum speed over the operating motor range while recording synchronized sensor and video data.
Annotate video clips as desirable or undesirable tumble motion, or compare clip pairs by relative motion quality.
Align annotations with sensor windows and train a sensor-only reward model.
Use the learned reward to train an incremental drum-speed controller with Soft Actor-Critic.
Deploy the final controller without video input; only production sensor streams are required.

Motion Supervision

The key design is to use video only as development-time supervision. Human-readable tumble quality is distilled into a reward model that maps sensor histories to scalar motion quality, so the learned controller remains compatible with production sensing constraints.

5 evaluated load compositions

2.04% average relative gain

2.86% best-case gain

Development Setup

Dryer instrumentation setup for synchronized sensor and video collection.

Household dryer setup used during reward learning and controller evaluation.

Real household dryer setup used for collecting synchronized sensor streams, observing internal tumble motion, and evaluating the learned controller.

Results

Moisture-Removal Performance

The proposed controller improved the normalized moisture-removal metric for all five evaluated loads. The mean metric increased from 0.6129 to 0.6261, corresponding to a mean per-load relative improvement of 2.04%.

Load Coverage

The experiments cover 3 kg and 5 kg mixed-fabric loads, different cotton-to-polyester ratios, and a towel-heavy 3 kg load.

Preference Reward Check

A preference-based reward model was also validated on the towel-heavy load. It followed the overall trend of binary motion labels and improved performance by 2.10%.

Deployment Constraint

Video is used only for annotation during development. During RL training and deployment, both the reward model and controller use onboard sensor sequences only.

Videos

Supplementary demonstration of dryer tumble-motion control.

Citation

@misc{lee2026sensorRewardDryer,
  title  = {Sensor-Based Reward Learning from Video Labels for Tumble Motion Control in a Household Dryer},
  author = {Lee, Jinwoo and Kang, Chanseok and Bae, Guntae},
  note   = {To appear at 2026 IEEE 22nd International Conference on Automation Science and Engineering (CASE)},
  year   = {2026}
}