Data For kubeflow/tf-operator

Only showing last 50 predictions


# Title Body Link Prediction Confidence Labeled?
1087 [enhancement] Replace common with kubeflow/common It is better to use kubeflow/common and remove the common package in tf-operator. /cc... feature_request 0.97 True None None
1086 Lack of documents for deployment I want to deploy a tf-operator on my cluster, but I can't find documents about it. does anybody... question 0.64 True None None
1079 Performance problem about pod informer ``` // Create pod informer. podInformer := kubeInformerFactory.Core().V1().Pods() // Set... feature_request 0.44 False None None
1078 [bug] Cannot initialize the training job when the user uses 1 worker and 0 PS There are some users want to use TFJob to run local training jobs with Estimator. They will have... bug 0.9 True None None
1077 Separate cluster scoped and namespace scoped resources Part of #1076 feature_request 0.92 True None None
1076 TFJob 1.0 Description | Category | Status | Issue -- | -- | -- | -- Kustomize package | Required |... feature_request 0.48 False None None
1068 [bug] Keep tf-job-role as deprecated label in this version Now we keep tf-job-name as the deprecated label. Thus I think we should keep the tf-job-role as well. bug 0.72 True None None
1066 GenLabels may select wrong Pods https://github.com/kubeflow/tf-operator/pull/1064 https://github.com/kubeflow/pytorch-operator/p... bug 0.91 True None None
1065 Can I create a tf-operator pod without using GO? I am unable to run my "TFJob pod" because I do not have a "tf-operator Pod". I have created a... bug 0.45 False None None
1060 tf-job-dashboard cannot work After install Kubeflow 0.6, tf-job-dashboard cannot work <img width="869" alt="image"... bug 0.96 True None None
1059 [discussion] Should We Add CleanPodPolicy PS? Now we have ``` CleanPodPolicyUndefined CleanPodPolicy = "" CleanPodPolicyAll ... question 0.83 True None None
1058 Refactor dockerfile In tf-operator... feature_request 0.71 True None None
1057 remove v1beta1 in v0.5.3 cause incompatible issue when using go mod We use tf-operator@v0.5.0 as our dependency, and go mod was try to get latest version of v0.5,... bug 0.75 True None None
1056 Invalid value: "v1beta1": must appear in spec.versions **Environment:** k8s 1.14.2 kuberctl 1.14.2 ks 0.13.1 kubeflow 0.4.1 minikube... bug 0.95 True None None
1053 Example on EKS: Device or resource busy Hi there, I'm trying... question 0.43 False None None
1048 can we add PriorityClassName when we create TF-job Podgroup? i use kube-batch to schedule for tf-job, kube-batch support set the priorityclass of podgroup,... feature_request 0.6 True None None
1045 TFjob still running while chief pod is completed Hello, I am using ` kubeflow.org/v1beta2` version and start a TFjob container only one chief... bug 0.74 True None None
1039 Is there any document for how to run TFJob in AllReduce Strategy Hello guys, I want to know if there is any document about how to run tfjob in all-reduce strategy? question 0.89 True None None
1035 tf-operator version conficts When running command **/opt/kubeflow/tf-operator.v1 -version** inside docker image... feature_request 0.4 False None None
1033 add E2E test for gang-scheduling We now support the gang-scheduling with using kube-batchd and PodGroup but we don't have tests for it. feature_request 0.98 True None None
1031 gang schedule annotation The annotation is need to be set when use gang scheduler as... feature_request 0.81 True None None
1030 [feature] Can we use one headless service for one job? We have ps/worker/chief for one TFJob. And now we create one headless service for one replica. I... feature_request 0.7 True None None
1029 Will tf-operator upgrading k8s to 1.13? I'm facing the problem that function **testing.NewPatchSubresourceAction** in... question 0.71 True None None
1026 no error log for create tfjob fail use api to create tfjob , i get this error: "create tfjob err,the error is the server rejected... bug 0.57 True None None
1024 Creating tfjob in dashboard usability issues - Invalid tfjob configurations do not display any errors. The user has to examine network... bug 0.91 True None None
1019 Deleting tf-job through the dashboard is not working When i tried to delete a tf-job through tf-job dashboard. I get a message saying "Are you sure... bug 0.95 True None None
1016 Create common CRD validate and mutating webhook for all operator If the spec of tfjob is invalid, we should reject the request when creating and also set default... feature_request 0.86 True None None
1011 Podgroup is constantly created and deleted after tfjob is success or failure Podgroup is constantly created and deleted after tfjob is success or failure, As shown... bug 0.96 True None None
1003 Failed to update TFJob status in version v1 I'm trying v0.5.1, after all pods & services created, there is an error msg: `error syncing... bug 0.85 True None None
1000 tfjob startTime should set immediately after create instead of wait pod of one replicaType are all running When I create tfjob with activeDeadlineSeconds but the image address is wrong, the pod of tfjob... bug 0.7 True None None
999 Jobs failing when a node is preempted On google kubernetes engine, I am finding that TFJobs fail when a node running a worker is... bug 0.82 True None None
997 tf-operator delete pod and service repeatedly tf-operator delete pod and service repeatedly when the tfjob is success or fail or exceed... bug 0.7 True None None
996 error with kubeflow instalation im installing toolkit for begin in kubernetes and with command... bug 0.89 True None None
994 tf-operator panic when cleanupTFJob tf-operator panic when clean tfjob that exceeds limit, because the `CompletionTime` of tfjob is... bug 0.93 True None None
991 Update kustomize files for tf-operator v1 feature_request 0.84 True None None
990 Create TFJob v1 documentation feature_request 0.92 True None None
989 Create TFJob v1 API and controller from v1beta2 TFjob v1beta2 has been stable for a while. We can create the v1 version now. feature_request 0.96 True None None
988 Prometheus support in TF Job Reference: https://github.com/kubeflow/common/issues/22 Since common repo is not ready yet,... feature_request 0.74 True None None
987 MasterRole label initialization I see that masterRole re-initialization is removed inside loop in this PR... bug 0.42 False None None
985 Shall we consider upgrading k8s to 1.11.3 I noticed the tf-operator version are using k8s v1.11.2 Are we going to upgrading it to... question 0.83 True None None
980 TFJob Dashboard is not support pvc Hi all, When submitting several TFjobs, I need add cephfs pvc, I can't choose pvc on volume... bug 0.45 False 0 0
976 ERROR handle object: patching object from cluster: merging object with existing state: unable to recognize "/var/folders/tl/zzfcr4zs53vgnpqqjq4n08sh0000gn/T/ksonnet-mergepatch020443124": no matches for kind "TFJob" in version "kubeflow.org/v1beta1" I am trying to follow the guide here: https://www.kubeflow.org/docs/gke/gcp-e2e/ However, when... question 0.67 True 1 1
975 Can not create tfjob using examples/v1beta1/dist-mnist/tf_job_mnist.yaml in self-created I can not link gci , so i use the image jackfantasy/tf-dist-mnist-test:1.0 The k8s is... bug 0.82 True 1 0