Add distributor which schedule task to be fairly by EmmyMiao87 · Pull Request #333 · apache/doris

EmmyMiao87 · 2018-11-21T07:41:56Z

Step1: updateBeIdTaskMaps, remove unalive be and add new alive be
Step2: process timeout tasks, if a task already has been allocate to be but not finished before DEFAULT_TASK_TIMEOUT_MINUTES, it will be discarded.
At the same time, the partitions belong to old task will be allocate to a new task. The new task with a signatrue will be add in the queue of needSchedulerRoutineLoadTask.
Step3: process all of needSchedulerRoutineLoadTask, allocate task to be. The task will be executed by backend.

imay · 2018-11-21T10:17:34Z

+                    RoutineLoadJob routineLoadJob = routineLoad.getJobByTaskId(routineLoadTaskInfo.getSignature());
+                    RoutineLoadTask routineLoadTask = null;
+                    if (routineLoadTaskInfo instanceof KafkaTaskInfo) {
+                        routineLoadTask = new KafkaRoutineLoadTask(routineLoadJob.getResourceInfo(),


Is this better that adding a interface function to create LoadTask, like RoutineLoadJob.createTask(TaksInfo)

Is this better than? @imay

Is this better than? @imay

I think so, because this can encapsulate creating KafkaRoutineLoadTask to KafkaRoutineLoadJob.

If we add new type of job, we needn't to change code here, and just add another RoutineLoadJob and RoutineLoadTask

imay · 2018-11-21T10:19:50Z

+        routineLoad.processTimeOutTasks();
+
+        // get idle be task num
+        int totalIdleTaskNum = routineLoad.getTotalIdleTaskNum();


Idle task? task to be scheduled? Could you change a name?

imay · 2018-11-21T10:22:31Z

+
+        // get idle be task num
+        int totalIdleTaskNum = routineLoad.getTotalIdleTaskNum();
+        int allocatedTaskNum = 0;


runingTask? I think that allocate is used for some resource. And task is to be scheduled, not allocated?

imay · 2018-11-21T10:25:36Z

+    private RoutineLoad routineLoad = Catalog.getInstance().getRoutineLoadInstance();
+
+    @Override
+    protected void runOneCycle() {


I suggest you put this function to another function like process, And this function call process and catch all Throwable to avoid this function throw a RuntimeError like NullpointerException

imay · 2018-11-21T10:26:05Z

+
+    public RoutineLoadTaskInfo(long signature) {
+        this.signature = signature;
+        this.lock = new ReentrantReadWriteLock(true);


Does this simple class need a lock?

After discuss with morningman I will use one lock between different RoutineLoadTaskInfo, because the time of locked can be ignore.

imay · 2018-11-21T10:30:32Z

 import java.util.concurrent.locks.ReentrantReadWriteLock;
 import java.util.stream.Collectors;

 public class RoutineLoad {


I think RoutineLoadManager is better.

Why Load not use LoadManager?

I think RoutineLoadManager is better too.

morningman · 2018-11-21T10:04:18Z

+            // diff beIds and beIdToMaxConcurrentTasks.keys()
+            List<Long> newBeIds = beIds.parallelStream().filter(entity -> beIdToMaxConcurrentTasks.get(entity) == null)
+                    .collect(Collectors.toList());
+            List<Long> decommissionBeIds = beIdToMaxConcurrentTasks.keySet().parallelStream()


Do not name as 'decommissionBeIds', cause 'decommission' means the backend is being decommissioned.
So just name it as 'unavailableBeIds'

morningman · 2018-11-21T10:10:10Z

+                beIdToConcurrentTasks.remove(beId);
+            }
+            LOG.info("There are {} backends which participate in routine load scheduler. "
+                            + "There are {} new bes and {} decommission bes for routine load",


change 'decommission'

morningman · 2018-11-21T10:11:20Z

+    private Map<Long, RoutineLoadTaskInfo> idToRoutineLoadTask;
+    // KafkaPartitions means that partitions belong to one task
+    // kafka partitions == routine load task (logical)
+    private Queue<RoutineLoadTaskInfo> needSchedulerRoutineLoadTask;


needSchedulerRoutineLoadTask -> needSchedulerRoutineLoadTasks

morningman · 2018-11-21T10:15:31Z

+        runningTasks.removeAll(needSchedulerRoutineLoadTask);
+
+        for (RoutineLoadTaskInfo routineLoadTaskInfo : runningTasks) {
+            routineLoadTaskInfo.writeLock();


It's highly recommended NOT to expose lock outside a class.
It will cause a lot of troubles.

And I can't see why you need a lock here?
Nothing need to be protected in RoutineLoadTaskInfo?

‘idToRoutineLoadTask’ is the member of RoutineLoad, it should not be protected by lock in RoutineLoadTaskInfo.

In logical, every task only can process one function, either processTimeOutTask or commit task. So it really need a segment lock for idToRoutineLoadTask. According to the lot of time of commit task, I will use one lock instead of per task per lock.

morningman · 2018-11-21T10:21:17Z

+                        idToRoutineLoadTask.put(kafkaTaskInfo.getSignature(), kafkaTaskInfo);
+                        needSchedulerRoutineLoadTask.add(kafkaTaskInfo);
+                    }
+                    LOG.debug("Task {} was ran more then {} minutes. It was removed and rescheduled");


log without parameter?

morningman · 2018-11-21T10:30:27Z

+            for (Map.Entry<Long, Integer> entry : beIdToMaxConcurrentTasks.entrySet()) {
+                if (beIdToConcurrentTasks.get(entry.getKey()) == null) {
+                    result = maxIdelTaskNum < entry.getValue() ? entry.getKey() : result;
+                    maxIdelTaskNum = Math.max(maxIdelTaskNum, entry.getValue());


maxIdelTaskNum: misspelling

check all 'idel' misspelling, please...

I will pay attention next time =_=

morningman · 2018-11-21T10:38:02Z

+            if (routineLoadTaskInfo != null) {
+                // when routine load task is not abandoned
+                if (routineLoad.getIdToRoutineLoadTask().get(routineLoadTaskInfo.getSignature()) != null) {
+                    long beId = routineLoad.getMinTaskBeId();


What if routineLoad.getMinTaskBeId() return 0, which means no backend has available slots to work?

It will not return 0, because the clusterIdleSlotNum is more then 0.

morningman · 2018-11-21T10:40:16Z

+                        routineLoad.addNumOfConcurrentTasksByBeId(beId);
+                    }
+                } else {
+                    LOG.debug("Task {} for job already has been discarded", routineLoadTaskInfo.getSignature());


The correct grammar is: 'has already been' or 'has been already', not 'already has been'

morningman · 2018-11-21T10:45:52Z

-            // TODO(ml): init load task
-            kafkaRoutineLoadTaskList.add(new KafkaRoutineLoadTask(getResourceInfo(), 0L, TTaskType.PUSH,
-                    dbId, tableId, 0L, 0L, 0L, SystemIdGenerator.getNextId()));
+            kafkaRoutineLoadTaskList.add(new KafkaTaskInfo(SystemIdGenerator.getNextId()));


SystemIdGenerator will produce repetitive id if Frontend restart or Master FE changed.
Just use a UUID or random Long is better, if you don't want to persist this info.

I think maybe I can reuse CatalogIdGenerator, but CatalogIdGenerator need add a field named name.

morningman · 2018-11-21T10:53:28Z

-            List<RoutineLoadTask> routineLoadTaskList = null;
+            List<RoutineLoadTaskInfo> routineLoadTaskList = null;
            try {
                routineLoadJob.writeLock();


use routineLoadJob.writeLock() to protect is weird.
And lock should be used as (lock() is outside the try{}):
lock();
try {

} finally {
unlock();
}

Using routineLoadJob.writeLock() means that every process of different RoutineLoadJob will not be blocked.

Step1: updateBeIdTaskMaps, remove unalive be and add new alive be Step2: process timeout tasks, if a task already has been allocate to be but not finished before DEFAULT_TASK_TIMEOUT_MINUTES, it will be discarded. At the same time, the partitions belong to old task will be allocate to a new task. The new task with a signatrue will be add in the queue of needSchedulerRoutineLoadTask. Step3: process all of needSchedulerRoutineLoadTask, allocate task to be. The task will be executed by backend.

morningman · 2018-11-22T01:53:13Z

-    public Map<Long, RoutineLoadTask> getIdToRoutineLoadTask() {
-        return idToRoutineLoadTask;
+    public Map<Long, RoutineLoadTaskInfo> getIdToRoutineLoadTask() {
+        readLock();


This lock protects nothing...
After the caller gets 'idToRoutineLoadTask', it can do anything without lock protection.

morningman · 2018-11-22T01:57:58Z

+        }
+    }
+
+    public long getMinTaskBeId() {


It DOES return 0L in some cases.

morningman · 2018-11-22T01:58:29Z

+    }
+
+    public Queue<RoutineLoadTaskInfo> getNeedSchedulerRoutineLoadTasks() {
+        readLock();


Still, this lock protects nothing.

morningman · 2018-11-22T01:59:20Z

        }
    }

+    public void processTimeOutTasks() {


Timeout, not TimeOut

morningman · 2018-11-22T02:02:14Z

+
+            for (RoutineLoadTaskInfo routineLoadTaskInfo : runningTasks) {
+                if ((System.currentTimeMillis() - routineLoadTaskInfo.getLoadStartTimeMs())
+                        > DEFAULT_TASK_TIMEOUT_MINUTES * 60 * 1000) {


5 min is too long? 10 sec I think?

morningman · 2018-11-22T02:07:01Z

            // judge nums of tasks more then max concurrent tasks of cluster
-            List<RoutineLoadTask> routineLoadTaskList = null;
+            List<RoutineLoadTaskInfo> routineLoadTaskList = null;
+            routineLoadJob.writeLock();


morningman · 2018-11-22T02:08:01Z

+        try {
+            process();
+        } catch (Throwable e) {
+            LOG.error("Failed to process one round of RoutineLoadTaskScheduler with error message {}",


warn level is appropriate

morningman · 2018-11-22T11:16:16Z

+
+            for (RoutineLoadTaskInfo routineLoadTaskInfo : runningTasks) {
+                if ((System.currentTimeMillis() - routineLoadTaskInfo.getLoadStartTimeMs())
+                        > DEFAULT_TASK_TIMEOUT_SECONDS * 60 * 1000) {


morningman · 2018-11-22T11:19:03Z

-        } finally {
-            readUnlock();
+        switch (jobState) {
+            case NEED_SCHEDULER:


Why not just:
stateJobs = idToRoutineLoadJob.values().stream()
.filter(entity -> entity.getState() == jobState)
.collect(Collectors.toList());

switch case is unnecessary.

morningman · 2018-11-22T11:25:43Z

    }

-    private void process() {
+    private void process() throws LoadException {


The default interval of Daemon thread is 30 seconds, which means you have to wait at least 30 seconds to schedule next batch of tasks?
Maybe you need a trigger mechanism?

morningman · 2018-11-23T01:49:45Z

-            default:
-                break;
-        }
+        idToRoutineLoadJob.values().stream()


You missed assigning 'stateJobs' variable....

morningman · 2018-11-27T04:29:04Z

#353

…maps between segments if BE restart before publish (apache#48775) (apache#48873) (apache#333) pick apache#48775

imay requested changes Nov 21, 2018

View reviewed changes

morningman requested changes Nov 21, 2018

View reviewed changes

EmmyMiao87 force-pushed the master branch 2 times, most recently from 382f4df to 38f90fd Compare November 21, 2018 14:04

EmmyMiao87 changed the title ~~Add distributor which allocate task to be fairly~~ Add distributor which scheduler task to be fairly Nov 21, 2018

EmmyMiao87 changed the title ~~Add distributor which scheduler task to be fairly~~ Add distributor which schedule task to be fairly Nov 21, 2018

EmmyMiao87 force-pushed the master branch from 38f90fd to 0cde552 Compare November 21, 2018 14:09

morningman requested changes Nov 22, 2018

View reviewed changes

Move RoutineLoadTaskInfo from RoutineLoadManager to RoutineLoadJob

caf4c93

morningman requested changes Nov 22, 2018

View reviewed changes

Change DEFAULT_TASK_TIMEOUT_SECONDS * 60 to DEFAULT_TASK_TIMEOUT_SECONDS

d8aea83

morningman requested changes Nov 23, 2018

View reviewed changes

Fix bug of RoutineLoadManager

2913e50

morningman approved these changes Nov 23, 2018

View reviewed changes

imay approved these changes Nov 23, 2018

View reviewed changes

morningman merged commit bbdf4fb into apache:master Nov 23, 2018

lide-reed mentioned this pull request Feb 18, 2019

Doris 0.9.0-incubating release notes #406

Closed

Conversation

EmmyMiao87 commented Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wuyunfeng Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 Nov 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmyMiao87 commented Nov 21, 2018 •

edited

Loading

wuyunfeng Nov 21, 2018 •

edited

Loading

EmmyMiao87 Nov 21, 2018 •

edited

Loading

EmmyMiao87 Nov 21, 2018 •

edited

Loading

EmmyMiao87 Nov 21, 2018 •

edited

Loading